This is a common garden experiment where individual fish (FISH_ID) were exposed to four different TEMPERATURE treatments (i.e., 27\(^\circ\), 28.5\(^\circ\), 30\(^\circ\), and 31.5\(^\circ\)). Fish sampled in this experiment were collected from three different sets of POPULATIONS that were collected from two different REGIONS; a low- and a high-latitude region (i.e., three populations from each region, six different populations in total).
Fish were first sampled at 27C and then subsequently at higher temperatures. Once all fish were tested at the set treatment temperature, the treatment temperature was increase 1.5\(^\circ\) over three days. Fish were then given an additional five days to adjust to the temperature increase.
To determine the MAX metabolic rate fish were placed in a swim tunnel for ten minutes. The first five minutes were used to slowly increase the water flow within the swim tunnel until fish reached the point where fish would alternate between pectoral swimming and lateral body undulations (i.e., gait change). The water flow within the swim tunnel was maintained at gait change speeds for an additional five minutes. Afterwards fish were immediately placed within a designated respirometry CHAMBER with air saturation rates being monitored for 3.5-4 hours on a four minute measure, three minutes flush, and five second wait cycle.
Maximum metabolic rate was determined by extracting the steepest sixty second interval slope from the first (sometimes, but rarely, second or third) measurement period. RESTING metabolic rate was determined by extacrting the ten shallowest slopes that were recorded over the length of the experiment.
Fish were sampled in random order. The order that fish were sampled in determined the SUMP/SYSTEM fish run on as well as the chamber they were placed in (i.e., the first fish sampled went into chamber 1 on sump/system 1, the second fish into chamber 1 on sump/system 2, the third fish into chamber 2 on sump/system 1 etc.).
Immunocompetence was tested via phytohaemaglutinin (PHA) swelling assays at the same four experimental temperatures metabolic performance was tested at. To perform the assay fish were injected with 0.03 mL of PHA subcutaneously in the caudal peduncle. Thickness of injection site was measured pre-injection as well as 18-24hour post-injection. PHA produces a localized, cell-mediated response (e.g., inflammation, T-cell proliferation, etc). The change in thickness between measurement periods was used as an proxy for immunocompetence.
2-weeks after live animal testing concluded blood and tissue samples were collected from each fish. White muscle tissue samples were used to assess enzyme activation levels of 2 different enzymes including, lactate dehydrogenase (LDH; anaerobic) and citrate synthase (CS; aerobic). Blood samples were used to determine hemaetocrit ratios.
| EXP_FISH_ID | Combined FISH_ID with TEMPERATURE the fish was tested at |
| FISH_ID | Unique alphamueric code provided to fish |
| POPULATION | Population/Reef the fish was collected from |
| REGION | Region (i.e. core or leading) fish was collected from |
| TEMPERATURE | Temperature fish was tested at |
| MASS | Mass of the fish |
| RESTING_DATE | Date the resting metabolic rate was recorded for each fish |
| RESTING_CHAMBER | Respirometry chamber the fish was tested in for RMR |
| RESTING_SYSTEM | Respirometry system (i.e. dell or asus) fish was tests with |
| RESTING_SUMP | Respirometry sump (i.e., 1 or 2) fish was tested in |
| RESTING_AM_PM | Time of day (i.e. morning or afternoon) fish was tested |
| RESTING_START_TIME | Time that the fish was placed inside the repirometry chamber |
| RESTING | Resting metabolic rate (RMR) |
| RESTING_MgO2.hr | Resting metabolic rate divded mass |
| MAX_DATE | Date that maximum metabolic rate was recored |
| MAX_CHAMBER | Respirometry chamber the fish was tested in for MMR |
| MAX_SYSTEM | Respirometry system fish was test with for MMR |
| MAX_SUMP | Respirometry sump (i.e., 1 or 2) fish was tested in for MMR |
| MAX_AM_PM | Time of day (i.e. morning or afternoon) fish was tested for MMR |
| MAX_START_TIME | Time that the fish was placed inside the chamber for MMR |
| MAX | Maximum metabolic rate (MMR) |
| MAX_MgO2.hr | Maximum metabolic rate divided by mass |
| FAS | Factorial metabolic rate (MMR/RMR) |
| NAS | Net metabolic rate (MMR - RMR) |
| MgO2.hr_Net | Net metaboic rate divided by mass |
| Swim.performance | Fish swim performance in swim tunnel (i.e., good, okay, poor) |
| Notes | Additional experimental notes |
| MASS_CENTERED | Mass of fish (centered) |
Lets start by loading the packages that are needed
library(tidyverse) # data manipulation
library(janitor) # data manipulation
library(plyr) # data manipulation
library(dplyr) # data manipulation
library(lubridate) # data manipulation - specifically time data
library(chron) # data manipulation - specifically time data
library(glmmTMB) # running models
library(performance) # model validation
library(DHARMa) # model validation
library(MuMIn) # model validation
library(modelr) # model validation
library(car) # used for Anova function
library(emmeans) # post-hoc analysis
library(kableExtra) # creating tables
library(vtable) # creating tables
library(ggplot2) # plotting figures
library(ggeffects) # plotting models/model validation
library(sjPlot) # plotting models
library(ggpubr) # plotting figures
library(broom) # dependentFor details on the experiment performed please read the information at the top of this document. In brief, Acanthochromis polyacanthus from two different regions on the Great Barrier Reef (GBR) were tested for metabolic performance at four different temperatures, 27\(^\circ\)C, 28.5\(^\circ\)C, 30\(^\circ\)C, and 31.5\(^\circ\)C. Fish used in this study were collected from two different regions, low- (i.e. Cairns) and high-latitude (i.e., Mackay), within each region fish were collected from a total of three different populations. Individuals were tested at each temperature, resting oxygen consumption, maximum oxygen consumption. Absolute aerboic scope was calculated by using the following formula:
Absolute aerobic scope = (maximum oxygen consumption - resting oxygen consumption)
Individuals were first tested at 27\(^\circ\)C. Water temperature was then increased at a rate of 0.5\(^\circ\)C Day^-1 until the next temperature was reached. Fish were then provided with an additional 5 day to adjust to the new temperature before aerobic physiology was tested again.
Three traits are included within the aerobic physiology analysis, resting oxygen consumption, maximum oxygen consumption, and absoulte aerboic scope. Data for each metric was collect from respiratory experiments that had data recorded via a combination of programs including, AquaResp and PyroScience. Slopes (i.e., resting and maximum oxygen consumption values) were then calculated via the RespR [https://januarharianto.github.io/respR/articles/respR.html] package.
Before beginning always make sure that you are working in the correct directory
Now we can import that data. Replace import data with the PATH to your data file. I have secretly labelled my PATH import.data (i.e. import.data = “PATH TO MY FILE”)
Before the data can be analysed it is important to clean up the data file. Below a number of adjustments are made, primarily making sure that columns are being treated appropriately as either factors, numeric, or as time, as well as the renaming of some columns. Once these changes are made the data is being saved into a new dataframe called resp2
resp2 = resp %>%
dplyr::rename(EXP_FISH_ID = FISH_ID) %>%
separate(EXP_FISH_ID, c("FISH_ID"), remove = FALSE) %>%
mutate(FISH_ID = factor(FISH_ID),
POPULATION = factor(POPULATION),
REGION = factor(REGION),
TEMPERATURE = as.numeric(TEMPERATURE), #run with temperature as a factor
RESTING_DATE = factor(RESTING_DATE),
RESTING_CHAMBER = factor(RESTING_CHAMBER),
RESTING_SYSTEM = factor(RESTING_SYSTEM),
RESTING_SUMP = factor(RESTING_SUMP),
RESTING_AM_PM = factor(RESTING_AM_PM),
RESTING_START_TIME = hms(RESTING_START_TIME),
RESTING_END_TIME = hms(RESTING_ENDTIME),
MAX_DATE = factor(MAX_DATE),
MAX_CHAMBER = factor(MAX_CHAMBER),
MAX_SYSTEM = factor(MAX_SYSTEM),
MAX_SUMP = factor(MAX_SUMP),
MAX_AM_PM = factor(MAX_AM_PM),
MAX_START_TIME = hms(MAX_START_TIME),
Swim.performance = factor(Swim.performance),
NAS = as.numeric(NAS),
FAS = as.numeric(FAS),
MgO2.hr_Net = as.numeric(MgO2.hr_Net),
RESTING_RUNTIME_SECONDS = as.numeric(hms(RESTING_RUNTIME))) %>%
dplyr::rename(MASS = WEIGHT) %>%
mutate(MASS_CENTERED = scale(MASS, scale = FALSE, center = TRUE))Next select data points will be removed. Beside the removed data points I have provided reasoning for their exclusion, such as fish died during the experiment, or data quailty was poor - which likely indicated that there was an issue with the equipment during the trial.
resp3 <- resp2 %>%
subset(
EXP_FISH_ID !="LCHA127_27" & # deceased during experiment
EXP_FISH_ID !="LCHA132_27" & # deceased during experiment
EXP_FISH_ID !="LKES168_27" # poor data quality
) Great! That is everything for data manipulation
ggplot(resp3, aes(MASS, RESTING_MgO2.hr_RESPR)) +
geom_point() +
geom_smooth(method = "lm") +
theme_classic()ggplot(resp3, aes(MASS, RESTING_MgO2.hr_RESPR, color = REGION)) +
geom_point() +
theme_classic() +
geom_smooth(method = "lm")ggplot(resp3, aes(TEMPERATURE, RESTING_MgO2.hr_RESPR, color = REGION)) +
geom_point() +
theme_classic()The model was fit using the glm and later glmmTMB package in R. A number of different models were tested to determine which hypothesis and associated variables best predicted resting oxygen consumption. Model fit was examined using AICc, BIC, and r-squared values. Additional model were examined via the validation diagonistics provided by the performance and dHARMA packages in R.
The first set of models tested looked at three different hypotheses including 1) that mass has a major impact of resting oxygen consumption of fish (this has been documented in the literature), 2) if variables related to time have an important impact on the resting oxygen consumption of fish.
#--- base model ---#
rmr.1 <- glm(RESTING_MgO2.hr_RESPR ~ 1+ REGION * TEMPERATURE + MASS_CENTERED,
family=gaussian(),
data = resp3) summary
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 0.9071699 | 1.8396670 | 0.4931164 | 0.6225248 |
| REGIONLeading | -1.0631773 | 2.7553334 | -0.3858616 | 0.7000498 |
| TEMPERATURE | 0.1759916 | 0.0632351 | 2.7831307 | 0.0059520 |
| MASS_CENTERED | 133.8254357 | 9.9843605 | 13.4035060 | 0.0000000 |
| REGIONLeading:TEMPERATURE | 0.0376457 | 0.0946023 | 0.3979365 | 0.6911434 |
#--- experimental rmr equipment hypothesis ---#
rmr.2 <- glm(RESTING_MgO2.hr_RESPR ~ 1+ REGION * TEMPERATURE + RESTING_SUMP +
RESTING_AM_PM + RESTING_RUNTIME_SECONDS,
family=gaussian(),
data = resp3) summary
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 0.0417266 | 2.5593818 | 0.0163034 | 0.9870104 |
| REGIONLeading | -1.8177666 | 3.7681132 | -0.4824077 | 0.6301025 |
| TEMPERATURE | 0.3212288 | 0.0949708 | 3.3823948 | 0.0008816 |
| RESTING_SUMP2 | -0.0016294 | 0.2215833 | -0.0073533 | 0.9941411 |
| RESTING_AM_PMPM | -0.4196626 | 0.2161103 | -1.9418904 | 0.0537114 |
| RESTING_RUNTIME_SECONDS | -0.0001616 | 0.0000513 | -3.1477047 | 0.0019264 |
| REGIONLeading:TEMPERATURE | 0.0125324 | 0.1293602 | 0.0968801 | 0.9229294 |
| model | df | AICc | BIC | r2 |
|---|---|---|---|---|
| rmr.1 | 6 | 561.9094 | 580.8294 | 0.6058512 |
| rmr.2 | 8 | 680.4894 | 705.5293 | 0.2768734 |
The model that contains MASS_CENTERED seems to do better than the model that incorporates variables that are associated with the time that experiments are performed.This is demonstrated by the lower AIC and BIC scores, as well as higher r-squared value. However, RESTING_RUNTIME_SECONDS was a significant variable in model 2. Let’s see what a third model looks like if we both MASS_CENTERED and RESTING_RUNTIME_SECONDS.
| model | df | AICc | BIC | r2 |
|---|---|---|---|---|
| rmr.1 | 6 | 561.9094 | 580.8294 | 0.6058512 |
| rmr.2 | 8 | 680.4894 | 705.5293 | 0.2768734 |
| rmr.3 | 7 | 544.9015 | 566.8936 | 0.6426906 |
It looks like the third model is better than the previous two. Next we will test to see if the variable temperature performs best as a 1^st (linear), 2^nd (quadratic), or 3^rd (cubic) order polynomial. As the relationship between temperature and resting oxygen consumption is predicted to be non-linear.
Note that the linear model has already been created via model rmr.3 in the previous section.
rmr.3.p2 <- glm(RESTING_MgO2.hr_RESPR ~ 1+ REGION * poly(TEMPERATURE, 2) + MASS_CENTERED + RESTING_RUNTIME_SECONDS,
family=gaussian(),
data = resp3)
rmr.3.p3 <- glm(RESTING_MgO2.hr_RESPR ~ 1+ REGION * poly(TEMPERATURE, 3) + MASS_CENTERED + RESTING_RUNTIME_SECONDS,
family=gaussian(),
data = resp3)polynomial model comparisons
| model | df | AICc | BIC | r2 |
|---|---|---|---|---|
| rmr.3 | 7 | 544.9015 | 566.8936 | 0.6489236 |
| rmr.3.p2 | 9 | 545.1143 | 573.1773 | 0.6566813 |
| rmr.3.p3 | 11 | 548.8463 | 582.8799 | 0.6580731 |
From our model comparison we can see the there is no additional benefit to the model by including temperature as a 2^nd or 3^rd order polynomial. However, the linear and quadratic model both perform well.
Fish were repeatedly sampled over four different temperatures, therefore repeated sampling needs to be accounted for. To do this random factors will be included within the model. There are a number of options that can be used for random factors including 1) accounting for repeated sampling of individuals, 2) accounting for repeated sampling of individuals nested within population, 3) account for repeated sampling of individuals and populations without nesting. All three models will be run and compared.
rmr.3a <- glmmTMB(RESTING_MgO2.hr_RESPR ~ 1+ REGION * TEMPERATURE + MASS_CENTERED + RESTING_RUNTIME_SECONDS + (1|FISH_ID),
family=gaussian(),
data = resp3,
REML = TRUE)
rmr.3b <- glmmTMB(RESTING_MgO2.hr_RESPR ~ 1+ REGION * TEMPERATURE + MASS_CENTERED + RESTING_RUNTIME_SECONDS + (1|POPULATION/FISH_ID),
family=gaussian(),
data = resp3,
REML = TRUE)
rmr.3c <- glmmTMB(RESTING_MgO2.hr_RESPR ~ 1+ REGION * TEMPERATURE + MASS_CENTERED + RESTING_RUNTIME_SECONDS + (1|FISH_ID) + (1|POPULATION),
family=gaussian(),
data = resp3,
REML = TRUE)random factor model comparisons
| model | df | AICc | BIC | r2m | r2c |
|---|---|---|---|---|---|
| rmr.3a | 8 | 559.0743 | 584.1142 | 0.642662 | 0.7293407 |
| rmr.3b | 9 | 561.2823 | 589.3453 | 0.642662 | 0.7293407 |
| rmr.3c | 9 | 561.2823 | 589.3453 | 0.642662 | 0.7293407 |
Model rmr.3a appears to be the best model, however, there seems to be no difference in how the models change depending on how the random factors are arranged.
The rmr.3a model performs well, however, in the model validation performed by the performance model it looks like there are two variables that are highly correlated. If we expand the figure we can see that the highly correlated variables are REGION and REGION:TEMPERATURE. Perhaps this is unsurprising but lets see what happens when we run the quadratic (2^nd polynomial) model to see if this helps deal with the high correlation between these two variables, as it performed very similarly to rmr.3a, and even had a higher r2 value.
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.516 0.716 0.216 0.896 0.868 0.824 0.736 0.764 0.756 0.852 0.384 0.136 0.128 0.1 0 0.344 0.184 0.336 0.324 0.016 ...
## $uniformity
##
## Asymptotic one-sample Kolmogorov-Smirnov test
##
## data: simulationOutput$scaledResiduals
## D = 0.042545, p-value = 0.8874
## alternative hypothesis: two-sided
##
##
## $dispersion
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## dispersion = 0.97177, p-value = 0.792
## alternative hypothesis: two.sided
##
##
## $outliers
##
## DHARMa outlier test based on exact binomial test with approximate
## expectations
##
## data: simulationOutput
## outliers at both margin(s) = 1, observations = 187, p-value = 1
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
## 0.0001353802 0.0294332214
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 )
## 0.005347594
## $uniformity
##
## Asymptotic one-sample Kolmogorov-Smirnov test
##
## data: simulationOutput$scaledResiduals
## D = 0.042545, p-value = 0.8874
## alternative hypothesis: two-sided
##
##
## $dispersion
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## dispersion = 0.97177, p-value = 0.792
## alternative hypothesis: two.sided
##
##
## $outliers
##
## DHARMa outlier test based on exact binomial test with approximate
## expectations
##
## data: simulationOutput
## outliers at both margin(s) = 1, observations = 187, p-value = 1
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
## 0.0001353802 0.0294332214
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 )
## 0.005347594
First we need to update the model by adding in the missing random factor
rmr.3.p2a <- glmmTMB(RESTING_MgO2.hr_RESPR ~ 1+ REGION * poly(TEMPERATURE, 2) + MASS_CENTERED + RESTING_RUNTIME_SECONDS + (1|FISH_ID),
family=gaussian(),
data = resp3,
REML = TRUE) ## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.444 0.784 0.164 0.932 0.884 0.764 0.688 0.852 0.792 0.824 0.308 0.168 0.16 0.06 0 0.408 0.212 0.272 0.272 0.036 ...
## $uniformity
##
## Asymptotic one-sample Kolmogorov-Smirnov test
##
## data: simulationOutput$scaledResiduals
## D = 0.04477, p-value = 0.8477
## alternative hypothesis: two-sided
##
##
## $dispersion
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## dispersion = 0.96383, p-value = 0.76
## alternative hypothesis: two.sided
##
##
## $outliers
##
## DHARMa outlier test based on exact binomial test with approximate
## expectations
##
## data: simulationOutput
## outliers at both margin(s) = 1, observations = 187, p-value = 1
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
## 0.0001353802 0.0294332214
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 )
## 0.005347594
## $uniformity
##
## Asymptotic one-sample Kolmogorov-Smirnov test
##
## data: simulationOutput$scaledResiduals
## D = 0.04477, p-value = 0.8477
## alternative hypothesis: two-sided
##
##
## $dispersion
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## dispersion = 0.96383, p-value = 0.76
## alternative hypothesis: two.sided
##
##
## $outliers
##
## DHARMa outlier test based on exact binomial test with approximate
## expectations
##
## data: simulationOutput
## outliers at both margin(s) = 1, observations = 187, p-value = 1
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
## 0.0001353802 0.0294332214
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 )
## 0.005347594
It looks like the model that treats temperature as a second order polynomial does a better job at avoiding high levels of collinearity within the model. The quadratic model will be used moving forward because it:
| Estimate | StdError | Zvalue | Pvalue | |
|---|---|---|---|---|
| (Intercept) | 8.3165021 | 0.5180504 | 16.0534626 | 0.0000000 |
| REGIONLeading | 0.0172115 | 0.2341284 | 0.0735129 | 0.9413980 |
| poly(TEMPERATURE, 2)1 | 6.5820041 | 1.3078841 | 5.0325592 | 0.0000005 |
| poly(TEMPERATURE, 2)2 | 2.7374223 | 1.1855357 | 2.3090171 | 0.0209426 |
| MASS_CENTERED | 133.3083220 | 12.2781446 | 10.8573670 | 0.0000000 |
| RESTING_RUNTIME_SECONDS | -0.0001467 | 0.0000321 | -4.5683146 | 0.0000049 |
| REGIONLeading:poly(TEMPERATURE, 2)1 | 0.8398236 | 1.7827144 | 0.4710926 | 0.6375746 |
| REGIONLeading:poly(TEMPERATURE, 2)2 | -2.2347739 | 1.7668476 | -1.2648368 | 0.2059298 |
| Chisq | Df | Pr(>Chisq) | |
|---|---|---|---|
| REGION | 0.003263 | 1 | 0.9544475 |
| poly(TEMPERATURE, 2) | 51.567789 | 2 | 0.0000000 |
| MASS_CENTERED | 117.882419 | 1 | 0.0000000 |
| RESTING_RUNTIME_SECONDS | 20.869498 | 1 | 0.0000049 |
| REGION:poly(TEMPERATURE, 2) | 1.823912 | 2 | 0.4017376 |
| 2.5 % | 97.5 % | Estimate | |
|---|---|---|---|
| (Intercept) | 7.3011420 | 9.3318621 | 8.3165021 |
| REGIONLeading | -0.4416717 | 0.4760947 | 0.0172115 |
| poly(TEMPERATURE, 2)1 | 4.0185984 | 9.1454098 | 6.5820041 |
| poly(TEMPERATURE, 2)2 | 0.4138150 | 5.0610297 | 2.7374223 |
| MASS_CENTERED | 109.2436009 | 157.3730432 | 133.3083220 |
| RESTING_RUNTIME_SECONDS | -0.0002096 | -0.0000837 | -0.0001467 |
| REGIONLeading:poly(TEMPERATURE, 2)1 | -2.6542324 | 4.3338797 | 0.8398236 |
| REGIONLeading:poly(TEMPERATURE, 2)2 | -5.6977316 | 1.2281839 | -2.2347739 |
| Std.Dev.(Intercept)|FISH_ID | 0.3510707 | 0.7291319 | 0.5059415 |
| R2_conditional | R2_marginal | optional |
|---|---|---|
| 0.7370331 | 0.6487262 | FALSE |
rmr.3.p2a %>% emtrends(var = "TEMPERATURE", type = "response") %>% pairs(by = "TEMPERATURE") %>% summary(by = NULL, adjust = "tukey", infer=TRUE)SCROLL TO THE RIGHT –>
The numbers in the left most column in the table just mention that the slopes are assuming mean MASS_CENTERED and RESTING_TIME_SEONDS values when looking at differences between latitudinal slopes.
rmr.3.p2a %>% emmeans(pairwise ~ TEMPERATURE*REGION, type = "response") %>% pairs(by = "TEMPERATURE") %>% summary(by = NULL, adjust = "tukey", infer=TRUE)rmr.3.p2a %>% update(.~1+ REGION * as.factor(TEMPERATURE) + MASS_CENTERED + RESTING_RUNTIME_SECONDS + (1|FISH_ID)) %>%
emmeans(~REGION*TEMPERATURE, type = "response") %>% summary(infer=TRUE)rmr.3.p2a %>% update(.~1+ REGION * as.factor(TEMPERATURE) + MASS_CENTERED + RESTING_RUNTIME_SECONDS + (1|FISH_ID)) %>%
emmeans(~REGION*TEMPERATURE, type = "response") %>% pairs(by ="REGION") %>% summary(infer=TRUE)rmr.emm <- rmr.3.p2a %>% emmeans(~REGION*TEMPERATURE)
eff_size(rmr.emm, sigma = sigma(rmr.3.p2a), edf=df.residual(rmr.3.p2a))## contrast
## Core TEMPERATURE29.0855614973262 - Leading TEMPERATURE29.0855614973262
## effect.size SE df lower.CL upper.CL
## -0.249 0.324 185 -0.889 0.391
##
## sigma used for effect sizes: 0.8731
## Confidence level used: 0.95
For details on the experiment performed please read the information at the top of this document. In brief, Acanthochromis polyacanthus from two different regions on the Great Barrier Reef (GBR) were tested for metabolic performance at four different temperatures, 27\(^\circ\)C, 28.5\(^\circ\)C, 30\(^\circ\)C, and 31.5\(^\circ\)C. Fish used in this study were collected from two different regions, low- (i.e. Cairns) and high-latitude (i.e., Mackay), within each region fish were collected from a total of three different populations. Individuals were tested at each temperature, resting oxygen consumption, maximum oxygen consumption. Absolute aerboic scope was calculated by using the following formula:
Absolute aerobic scope = (maximum oxygen consumption - resting oxygen consumption)
Individuals were first tested at 27\(^\circ\)C. Water temperature was then increased at a rate of 0.5\(^\circ\)C Day^-1 until the next temperature was reached. Fish were then provided with an additional 5 day to adjust to the new temperature before aerobic physiology was tested again.
Three traits are included within the aerobic physiology analysis, resting oxygen consumption, maximum oxygen consumption, and absoulte aerboic scope. Data for each metric was collect from respiratory experiments that had data recorded via a combination of programs including, AquaResp and PyroScience. Slopes (i.e., resting and maximum oxygen consumption values) were then calculated via the RespR [https://januarharianto.github.io/respR/articles/respR.html] package.
Before beginning always make sure that you are working in the correct directory
Now we can import that data. Replace import data with the PATH to your data file. I have secretly labelled my PATH import.data (i.e. import.data = “PATH TO MY FILE”)
Before the data can be analysed it is important to clean up the data file. Below a number of adjustments are made, primarily making sure that columns are being treated appropriately as either factors, numeric, or as time, as well as the renaming of some columns. Once these changes are made the data is being saved into a new dataframe called resp2
resp2 = resp %>%
dplyr::rename(EXP_FISH_ID = FISH_ID) %>%
separate(EXP_FISH_ID, c("FISH_ID"), remove = FALSE) %>%
mutate(FISH_ID = factor(FISH_ID),
POPULATION = factor(POPULATION),
REGION = factor(REGION),
TEMPERATURE = as.numeric(TEMPERATURE), #run with temperature as a factor
RESTING_DATE = factor(RESTING_DATE),
RESTING_CHAMBER = factor(RESTING_CHAMBER),
RESTING_SYSTEM = factor(RESTING_SYSTEM),
RESTING_SUMP = factor(RESTING_SUMP),
RESTING_AM_PM = factor(RESTING_AM_PM),
RESTING_START_TIME = hms(RESTING_START_TIME),
RESTING_END_TIME = hms(RESTING_ENDTIME),
MAX_DATE = factor(MAX_DATE),
MAX_CHAMBER = factor(MAX_CHAMBER),
MAX_SYSTEM = factor(MAX_SYSTEM),
MAX_SUMP = factor(MAX_SUMP),
MAX_AM_PM = factor(MAX_AM_PM),
MAX_START_TIME = hms(MAX_START_TIME),
Swim.performance = factor(Swim.performance),
NAS = as.numeric(NAS),
FAS = as.numeric(FAS),
MgO2.hr_Net = as.numeric(MgO2.hr_Net),
RESTING_RUNTIME_SECONDS = as.numeric(hms(RESTING_RUNTIME))) %>%
dplyr::rename(MASS = WEIGHT) %>%
mutate(MASS_CENTERED = scale(MASS, scale = FALSE, center = TRUE))Next select data points will be removed. Beside the removed data points I have provided reasoning for their exclusion, such as fish died during the experiment, or data quailty was poor - which likely indicated that there was an issue with the equipment during the trial.
resp3 <- resp2 %>%
subset(
EXP_FISH_ID !="LCHA127_27" & # deceased during experiment
EXP_FISH_ID !="LCHA132_27" & # deceased during experiment
EXP_FISH_ID !="LKES168_27" # poor data quality
) So far the analysis has been the same as the protocol outlined in the resting oxygen consumption data. One additional data removal step will take place in the maximum oxygen consumption analysis for samples where fish swam poorly and therefore their maximum oxygen consumption data is thought to be unreliable. This step is done before any data analysis has taken place.
resp4 <- resp3 %>%
subset(
EXP_FISH_ID !="CSUD008_27" & # poor swim
EXP_FISH_ID !="CSUD008_30" & # poor swim
EXP_FISH_ID !="CSUD008_28.5" & # poor swim
EXP_FISH_ID !="CSUD018_31.5" & # poor swim
EXP_FISH_ID !="CSUD026_30" & # max. value low
EXP_FISH_ID !="CSUD074_28.5" & # fas value low
EXP_FISH_ID !="CSUD079_30" &
EXP_FISH_ID !="CVLA052_27" & #nas value low
EXP_FISH_ID !="CVLA054_28.5" & # low max value?
EXP_FISH_ID !="LCHA113_27" & # poor data quality
EXP_FISH_ID !="LCHA113_30" & # poor swim
EXP_FISH_ID !="LCHA127_27" # deceased during experiment
) ggplot(resp4, aes(MASS, MAX_MgO2.hr_RESPR)) +
geom_point() +
geom_smooth(method = "lm") +
theme_classic()ggplot(resp4, aes(MASS, MAX_MgO2.hr_RESPR, color = REGION)) +
geom_point() +
theme_classic() +
geom_smooth(method = "lm")The model was fit using the glm and later glmmTMB package in R. A number of different models were tested to determine which hypothesis and associated variables best predicted resting oxygen consumption. Model fit was examined using AICc, BIC, and r-squared values. Additional model were examined via the validation diagnostics provided by the performance and dHARMA packages in R.
#--- base model ---#
mmr.1 <- glm(MAX_MgO2.hr_RESPR ~ 1+ REGION * TEMPERATURE + MASS_CENTERED,
family=gaussian(),
data = resp4) summary
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | -4.5748341 | 4.4522838 | -1.027525 | 0.3056245 |
| REGIONLeading | 17.6468408 | 6.5852635 | 2.679747 | 0.0080880 |
| TEMPERATURE | 0.6944978 | 0.1530872 | 4.536615 | 0.0000107 |
| MASS_CENTERED | 298.1596890 | 24.1436460 | 12.349406 | 0.0000000 |
| REGIONLeading:TEMPERATURE | -0.6609056 | 0.2261837 | -2.921986 | 0.0039482 |
#--- experimental rmr equipment hypothesis ---#
mmr.2 <- glm(MAX_MgO2.hr_RESPR ~ 1+ REGION * TEMPERATURE + MAX_SUMP + MAX_CHAMBER +
MAX_AM_PM,
family=gaussian(),
data = resp4) summary
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | -6.2242856 | 6.4786135 | -0.9607435 | 0.3380701 |
| REGIONLeading | 16.0665387 | 9.1812310 | 1.7499330 | 0.0819666 |
| TEMPERATURE | 0.7876835 | 0.2182937 | 3.6083665 | 0.0004069 |
| MAX_SUMP2 | -0.1739027 | 0.5317501 | -0.3270383 | 0.7440484 |
| MAX_CHAMBER2 | 0.9203132 | 0.7549758 | 1.2189970 | 0.2245644 |
| MAX_CHAMBER3 | 1.1525268 | 0.7667736 | 1.5030861 | 0.1347055 |
| MAX_CHAMBER4 | 0.5061699 | 0.8313336 | 0.6088649 | 0.5434413 |
| MAX_AM_PMPM | -0.4757712 | 0.5382845 | -0.8838656 | 0.3780394 |
| REGIONLeading:TEMPERATURE | -0.7177722 | 0.3148211 | -2.2799368 | 0.0238764 |
| model | df | AICc | BIC | r2 |
|---|---|---|---|---|
| mmr.1 | 6 | 828.4562 | 846.9821 | 0.6622066 |
| mmr.2 | 10 | 945.6551 | 976.0266 | 0.3732903 |
The model that contains MASS_CENTERED seems to do better than the model that incorporates variables that are associated with the time that experiments are performed.This is demonstrated by the lower AIC and BIC scores, as well as higher r-squared value.
Note that the linear model has already been created via model mmr.1 in the previous section.
mmr.1.p2 <- glm(MAX_MgO2.hr_RESPR ~ 1+ REGION * poly(TEMPERATURE, 2) + MASS_CENTERED,
family=gaussian(),
data = resp4)
mmr.1.p3 <- glm(MAX_MgO2.hr_RESPR ~ 1+ REGION * poly(TEMPERATURE, 3) + MASS_CENTERED,
family=gaussian(),
data = resp4)polynomial model comparisons
| model | df | AICc | BIC | r2 |
|---|---|---|---|---|
| mmr.1 | 6 | 828.4562 | 846.9821 | 0.6673593 |
| mmr.1.p2 | 8 | 832.3856 | 856.8872 | 0.6681820 |
| mmr.1.p3 | 10 | 836.8419 | 867.2134 | 0.6682098 |
From our model comparison we can see the there is no additional benefit to the model by including temperature as a 2nd or 3rd order polynomial. However, the linear and quadratic model both perform well.
Fish were repeatedly sampled over four different temperatures, therefore repeated sampling needs to be accounted for. To do this random factors will be included within the model. There are a number of options that can be used for random factors including 1) accounting for repeated sampling of individuals, 2) accounting for repeated sampling of individuals nested within population, 3) account for repeated sampling of individuals and populations without nesting. All three models will be run a compared.
mmr.1a <- glmmTMB(MAX_MgO2.hr_RESPR ~ 1+ REGION * TEMPERATURE + MASS_CENTERED + (1|FISH_ID),
family=gaussian(),
data = resp4,
REML = TRUE)
mmr.1b <- glmmTMB(MAX_MgO2.hr_RESPR ~ 1+ REGION * TEMPERATURE + MASS_CENTERED + (1|POPULATION/FISH_ID),
family=gaussian(),
data = resp4,
REML = TRUE)
mmr.1c <- glmmTMB(MAX_MgO2.hr_RESPR ~ 1+ REGION * TEMPERATURE + MASS_CENTERED + (1|FISH_ID) + (REGION|POPULATION),
family=gaussian(),
data = resp4,
REML = TRUE) # Convergence problemrandom factor model comparisons
| model | df | AICc | BIC | r2m | r2c |
|---|---|---|---|---|---|
| mmr.1a | 7 | 809.5847 | 831.1115 | 0.6693399 | 0.7836732 |
| mmr.1b | 8 | 811.5475 | 836.0491 | 0.6694043 | 0.7848247 |
| mmr.1c | 10 | NA | NA | 0.6697666 | 0.7848011 |
Model mmr.1a appears to be the best model, however, there seems to be little difference in how the models change depending on how the random factors are arranged.
The mmr.1a model performs well, however, in the model validation performed by the performance model it looks like there are two variables that are highly correlated. If we expand the figure we can see that the highly correlated variables are REGION and REGION:TEMPERATURE. Perhaps this is unsurprising but lets see what happens when we run the quadratic (2nd polynomial) model to see if this helps deal with the high correlation between these two variables, as it performed very similarly to mmr.1a, and even had a higher r2 value.
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.212 0.328 0.456 0.744 0.812 0.852 1 0.7 0.372 0.684 0.692 0.032 0.216 0.204 0.612 0.084 0.452 0.988 0.936 0.848 ...
## $uniformity
##
## Asymptotic one-sample Kolmogorov-Smirnov test
##
## data: simulationOutput$scaledResiduals
## D = 0.0715, p-value = 0.3293
## alternative hypothesis: two-sided
##
##
## $dispersion
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## dispersion = 0.94899, p-value = 0.68
## alternative hypothesis: two.sided
##
##
## $outliers
##
## DHARMa outlier test based on exact binomial test with approximate
## expectations
##
## data: simulationOutput
## outliers at both margin(s) = 3, observations = 176, p-value = 0.1665
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
## 0.003529072 0.049003686
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 )
## 0.01704545
## $uniformity
##
## Asymptotic one-sample Kolmogorov-Smirnov test
##
## data: simulationOutput$scaledResiduals
## D = 0.0715, p-value = 0.3293
## alternative hypothesis: two-sided
##
##
## $dispersion
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## dispersion = 0.94899, p-value = 0.68
## alternative hypothesis: two.sided
##
##
## $outliers
##
## DHARMa outlier test based on exact binomial test with approximate
## expectations
##
## data: simulationOutput
## outliers at both margin(s) = 3, observations = 176, p-value = 0.1665
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
## 0.003529072 0.049003686
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 )
## 0.01704545
First we need to update the model by adding in the missing random factor
mmr.1.p2a <- glmmTMB(MAX_MgO2.hr_RESPR ~ 1+ REGION * poly(TEMPERATURE, 2) + MASS_CENTERED + (1|FISH_ID),
family=gaussian(),
data = resp4,
REML = TRUE) ## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.228 0.308 0.492 0.728 0.788 0.876 1 0.728 0.356 0.664 0.724 0.04 0.188 0.192 0.644 0.092 0.42 0.98 0.936 0.836 ...
## $uniformity
##
## Asymptotic one-sample Kolmogorov-Smirnov test
##
## data: simulationOutput$scaledResiduals
## D = 0.076136, p-value = 0.2594
## alternative hypothesis: two-sided
##
##
## $dispersion
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## dispersion = 0.94042, p-value = 0.608
## alternative hypothesis: two.sided
##
##
## $outliers
##
## DHARMa outlier test based on exact binomial test with approximate
## expectations
##
## data: simulationOutput
## outliers at both margin(s) = 3, observations = 176, p-value = 0.1665
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
## 0.003529072 0.049003686
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 )
## 0.01704545
## $uniformity
##
## Asymptotic one-sample Kolmogorov-Smirnov test
##
## data: simulationOutput$scaledResiduals
## D = 0.076136, p-value = 0.2594
## alternative hypothesis: two-sided
##
##
## $dispersion
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## dispersion = 0.94042, p-value = 0.608
## alternative hypothesis: two.sided
##
##
## $outliers
##
## DHARMa outlier test based on exact binomial test with approximate
## expectations
##
## data: simulationOutput
## outliers at both margin(s) = 3, observations = 176, p-value = 0.1665
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
## 0.003529072 0.049003686
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 )
## 0.01704545
It looks like the model that treats temperature as a second order polynomial does a better job at avoiding high levels of collinearity within the model. The quadratic model will be used moving forward because it:
| Estimate | StdError | Zvalue | Pvalue | |
|---|---|---|---|---|
| (Intercept) | 15.665104 | 0.3860064 | 40.5824948 | 0.0000000 |
| REGIONLeading | -1.540213 | 0.6378566 | -2.4146702 | 0.0157495 |
| poly(TEMPERATURE, 2)1 | 14.695118 | 2.8646793 | 5.1297602 | 0.0000003 |
| poly(TEMPERATURE, 2)2 | -2.507947 | 2.8404686 | -0.8829342 | 0.3772718 |
| MASS_CENTERED | 313.864413 | 34.0009493 | 9.2310485 | 0.0000000 |
| REGIONLeading:poly(TEMPERATURE, 2)1 | -13.979779 | 4.2142722 | -3.3172464 | 0.0009091 |
| REGIONLeading:poly(TEMPERATURE, 2)2 | 1.659481 | 4.1665787 | 0.3982838 | 0.6904210 |
| Chisq | Df | Pr(>Chisq) | |
|---|---|---|---|
| REGION | 5.244328 | 1 | 0.0220184 |
| poly(TEMPERATURE, 2) | 16.284260 | 2 | 0.0002910 |
| MASS_CENTERED | 85.212257 | 1 | 0.0000000 |
| REGION:poly(TEMPERATURE, 2) | 11.182851 | 2 | 0.0037297 |
| 2.5 % | 97.5 % | Estimate | |
|---|---|---|---|
| (Intercept) | 14.908545 | 16.4216626 | 15.665104 |
| REGIONLeading | -2.790389 | -0.2900373 | -1.540213 |
| poly(TEMPERATURE, 2)1 | 9.080450 | 20.3097864 | 14.695118 |
| poly(TEMPERATURE, 2)2 | -8.075163 | 3.0592692 | -2.507947 |
| MASS_CENTERED | 247.223777 | 380.5050495 | 313.864413 |
| REGIONLeading:poly(TEMPERATURE, 2)1 | -22.239601 | -5.7199575 | -13.979779 |
| REGIONLeading:poly(TEMPERATURE, 2)2 | -6.506863 | 9.8258252 | 1.659481 |
| Std.Dev.(Intercept)|FISH_ID | 1.054358 | 2.1168357 | 1.493956 |
| R2_conditional | R2_marginal | optional |
|---|---|---|
| 0.7829506 | 0.6683685 | FALSE |
mmr.1.p2a %>% emtrends(var = "TEMPERATURE", type = "response") %>% pairs(by = "TEMPERATURE") %>% summary(by = NULL, adjust = "tukey", infer=TRUE)SCROLL TO THE RIGHT –>
The numbers in the left most column in the table just mention that the slopes are assuming mean MASS_CENTERED values when looking at differences between latitudinal slopes.
mmr.1.p2a %>% emmeans(pairwise ~ TEMPERATURE*REGION, type = "response") %>% pairs(by = "TEMPERATURE") %>% summary(by = NULL, adjust = "tukey", infer=TRUE)mmr.1.p2a %>% update(.~1+ REGION * as.factor(TEMPERATURE) + MASS_CENTERED + RESTING_RUNTIME_SECONDS + (1|FISH_ID)) %>%
emmeans(~REGION*TEMPERATURE, type = "response") %>% summary(infer=TRUE)mmr.1.p2a %>% update(.~1+ REGION * as.factor(TEMPERATURE) + MASS_CENTERED + RESTING_RUNTIME_SECONDS + (1|FISH_ID)) %>%
emmeans(~REGION*TEMPERATURE, type = "response") %>% pairs(by ="REGION") %>% summary(infer=TRUE)mmr.emm <- mmr.1.p2a %>% emmeans(~REGION*TEMPERATURE)
eff_size(mmr.emm, sigma = sigma(mmr.1.p2a), edf=df.residual(mmr.1.p2a))## contrast
## Core TEMPERATURE29.0965909090909 - Leading TEMPERATURE29.0965909090909
## effect.size SE df lower.CL upper.CL
## 0.825 0.364 174 0.106 1.54
##
## sigma used for effect sizes: 2.056
## Confidence level used: 0.95
For details on the experiment performed please read the information at the top of this document. In brief, Acanthochromis polyacanthus from two different regions on the Great Barrier Reef (GBR) were tested for metabolic performance at four different temperatures, 27\(^\circ\)C, 28.5\(^\circ\)C, 30\(^\circ\)C, and 31.5\(^\circ\)C. Fish used in this study were collected from two different regions, low- (i.e. Cairns) and high-latitude (i.e., Mackay), within each region fish were collected from a total of three different populations. Individuals were tested at each temperature, resting oxygen consumption, maximum oxygen consumption. Absolute aerboic scope was calculated by using the following formula:
Absolute aerobic scope = (maximum oxygen consumption - resting oxygen consumption)
Individuals were first tested at 27\(^\circ\)C. Water temperature was then increased at a rate of 0.5\(^\circ\)C Day^-1 until the next temperature was reached. Fish were then provided with an additional 5 day to adjust to the new temperature before aerobic physiology was tested again.
Three traits are included within the aerobic physiology analysis, resting oxygen consumption, maximum oxygen consumption, and absolute aerobic scope. Data for each metric was collect from respiratory experiments that had data recorded via a combination of programs including, AquaResp3 and PyroScience. Slopes (i.e., resting and maximum oxygen consumption values) were then calculated via the RespR [https://januarharianto.github.io/respR/articles/respR.html] package.
Note: Absolute aerobic scope is sometime called net aerobic scope. When making the models labeling was done using ‘net aerobic scope’ (i.e., nas).
Before beginning always make sure that you are working in the correct directory
Now we can import that data. Replace import data with the PATH to your data file. I have secretly labelled my PATH import.data (i.e. import.data = “PATH TO MY FILE”)
Before the data can be analysed it is important to clean up the data file. Below a number of adjustments are made, primarily making sure that columns are being treated appropriately as either factors, numeric, or as time, as well as the renaming of some columns. Once these changes are made the data is being saved into a new dataframe called resp2
resp2 = resp %>%
dplyr::rename(EXP_FISH_ID = FISH_ID) %>%
separate(EXP_FISH_ID, c("FISH_ID"), remove = FALSE) %>%
mutate(FISH_ID = factor(FISH_ID),
POPULATION = factor(POPULATION),
REGION = factor(REGION),
TEMPERATURE = as.numeric(TEMPERATURE), #run with temperature as a factor
RESTING_DATE = factor(RESTING_DATE),
RESTING_CHAMBER = factor(RESTING_CHAMBER),
RESTING_SYSTEM = factor(RESTING_SYSTEM),
RESTING_SUMP = factor(RESTING_SUMP),
RESTING_AM_PM = factor(RESTING_AM_PM),
RESTING_START_TIME = hms(RESTING_START_TIME),
RESTING_END_TIME = hms(RESTING_ENDTIME),
MAX_DATE = factor(MAX_DATE),
MAX_CHAMBER = factor(MAX_CHAMBER),
MAX_SYSTEM = factor(MAX_SYSTEM),
MAX_SUMP = factor(MAX_SUMP),
MAX_AM_PM = factor(MAX_AM_PM),
MAX_START_TIME = hms(MAX_START_TIME),
Swim.performance = factor(Swim.performance),
NAS = as.numeric(NAS),
FAS = as.numeric(FAS),
MgO2.hr_Net = as.numeric(MgO2.hr_Net),
RESTING_RUNTIME_SECONDS = as.numeric(hms(RESTING_RUNTIME))) %>%
dplyr::rename(MASS = WEIGHT) %>%
mutate(MASS_CENTERED = scale(MASS, scale = FALSE, center = TRUE))Next select data points will be removed. Beside the removed data points I have provided reasoning for their exclusion, such as fish died during the experiment, or data quailty was poor - which likely indicated that there was an issue with the equipment during the trial.
resp3 <- resp2 %>%
subset(
EXP_FISH_ID !="LCHA127_27" & # deceased during experiment
EXP_FISH_ID !="LCHA132_27" & # deceased during experiment
EXP_FISH_ID !="LKES168_27" # poor data quality
) So far the analysis has been the same as the protocol outlined in the maximum oxygen consumption data.
resp4 <- resp3 %>%
subset(
EXP_FISH_ID !="CSUD008_27" & # poor swim
EXP_FISH_ID !="CSUD008_30" & # poor swim
EXP_FISH_ID !="CSUD008_28.5" & # poor swim
EXP_FISH_ID !="CSUD018_31.5" & # poor swim
EXP_FISH_ID !="CSUD026_30" & # max. value low
EXP_FISH_ID !="CSUD074_28.5" & # fas value low
EXP_FISH_ID !="CSUD079_30" &
EXP_FISH_ID !="CVLA052_27" & #nas value low
EXP_FISH_ID !="CVLA054_28.5" & # low max value?
EXP_FISH_ID !="LCHA113_27" & # poor data quality
EXP_FISH_ID !="LCHA113_30" & # poor swim
EXP_FISH_ID !="LCHA127_27" # deceased during experiment
) ggplot(resp4, aes(MASS, MgO2.hr_Net, color = REGION)) +
geom_point() +
theme_classic() +
geom_smooth(method = "lm")The model was fit using the glm and later glmmTMB package in R. A number of different models were tested to determine which hypothesis and associated variables best predicted resting oxygen consumption. Model fit was examined using AICc, BIC, and r-squared values. Additional model were examined via the validation diagnostics provided by the performance and dHARMA packages in R.
#--- base model ---#
nas.1 <- glm(MgO2.hr_Net ~ 1+ REGION * TEMPERATURE + MASS_CENTERED,
family=gaussian(),
data = resp4) summary
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | -4.6505476 | 4.5988210 | -1.011248 | 0.3133266 |
| REGIONLeading | 17.8617376 | 6.8020030 | 2.625953 | 0.0094246 |
| TEMPERATURE | 0.4921098 | 0.1581257 | 3.112143 | 0.0021773 |
| MASS_CENTERED | 167.6469322 | 24.9382811 | 6.722473 | 0.0000000 |
| REGIONLeading:TEMPERATURE | -0.6707332 | 0.2336280 | -2.870945 | 0.0046100 |
#--- experimental rmr equipment hypothesis ---#
nas.2 <- glm(MgO2.hr_Net ~ 1+ REGION * TEMPERATURE + MASS_CENTERED + RESTING_SUMP + RESTING_RUNTIME_SECONDS +
RESTING_AM_PM,
family=gaussian(),
data = resp4) summary
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | -4.1166901 | 4.6900539 | -0.8777490 | 0.3813335 |
| REGIONLeading | 17.7878857 | 6.8578591 | 2.5937958 | 0.0103304 |
| TEMPERATURE | 0.4314945 | 0.1732534 | 2.4905397 | 0.0137265 |
| MASS_CENTERED | 167.2794598 | 25.2897560 | 6.6145146 | 0.0000000 |
| RESTING_SUMP2 | 0.0078043 | 0.4108966 | 0.0189932 | 0.9848690 |
| RESTING_RUNTIME_SECONDS | 0.0000818 | 0.0000936 | 0.8735816 | 0.3835932 |
| RESTING_AM_PMPM | -0.1069842 | 0.3960028 | -0.2701603 | 0.7873685 |
| REGIONLeading:TEMPERATURE | -0.6678804 | 0.2355090 | -2.8359020 | 0.0051315 |
| model | df | AICc | BIC | r2 |
|---|---|---|---|---|
| nas.1 | 6 | 839.8550 | 858.3808 | 0.440634 |
| nas.2 | 9 | 845.5166 | 872.9666 | 0.439167 |
The model that contains MASS_CENTERED seems to do better than the model that incorporates variables that are associated with the time that experiments are performed.This is demonstrated by the lower AIC and BIC scores, as well as higher r-squared value.
Note that the linear model has already been created via model nas.1 in the previous section.
#--- second order polynomial ---#
nas.1.p2 <- glm(MgO2.hr_Net ~ 1+ REGION * poly(TEMPERATURE,2) + MASS_CENTERED,
family=gaussian(),
data = resp4)
#--- third order polynomial ---#
nas.1.p3 <- glm(MgO2.hr_Net ~ 1+ REGION * poly(TEMPERATURE, 3) + MASS_CENTERED,
family=gaussian(),
data = resp4) polynomial model comparisons
| model | df | AICc | BIC | r2 |
|---|---|---|---|---|
| nas.1 | 6 | 839.8550 | 858.3808 | 0.4463407 |
| nas.1.p2 | 8 | 840.4431 | 864.9447 | 0.4580961 |
| nas.1.p3 | 10 | 844.9023 | 875.2738 | 0.4581327 |
From our model comparison we can see the there is no additional benefit to the model by including temperature as a 2nd or 3rd order polynomial. However, the linear and quadratic model both perform well.
Fish were repeatedly sampled over four different temperatures, therefore repeated sampling needs to be accounted for. To do this random factors will be included within the model. There are a number of options that can be used for random factors including 1) accounting for repeated sampling of individuals, 2) accounting for repeated sampling of individuals nested within population, 3) account for repeated sampling of individuals and populations without nesting. All three models will be run a compaired.
nas.1a <- glmmTMB(MgO2.hr_Net ~ 1+ REGION * TEMPERATURE + MASS_CENTERED + (1|FISH_ID),
family=gaussian(),
data = resp4,
REML = TRUE)
nas.1b <- glmmTMB(MgO2.hr_Net ~ 1+ REGION * TEMPERATURE + MASS_CENTERED + (1|POPULATION/FISH_ID),
family=gaussian(),
data = resp4,
REML = TRUE)
nas.1c <- glmmTMB(MgO2.hr_Net ~ 1+ REGION * TEMPERATURE + MASS_CENTERED + (1|FISH_ID) + (REGION|POPULATION),
family=gaussian(),
data = resp4,
REML = TRUE) # convergence problemrandom factor model comparisons
| model | df | AICc | BIC | r2m | r2c |
|---|---|---|---|---|---|
| nas.1a | 7 | 828.7221 | 850.2488 | 0.4525497 | 0.5901940 |
| nas.1b | 8 | 830.8106 | 855.3122 | 0.4542611 | 0.5928177 |
| nas.1c | 10 | NA | NA | 0.4560622 | 0.5944637 |
Model nas.1a appears to be the best model, however, there seems to be little difference in how the models change depending on how the random factors are arranged.
The nas.1a model performs well, however, in the model validation performed by the performance model it looks like there are two variables that are highly correlated. If we expand the figure we can see that the highly correlated variables are REGION and REGION:TEMPERATURE. Perhaps this is unsurprising but lets see what happens when we run the quadratic (2nd polynomial) model to see if this helps deal with the high correlation between these two variables, as it performed very similarly to nas.1a, and even had a higher r2 value.
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.212 0.244 0.556 0.532 0.74 0.728 1 0.712 0.596 0.868 0.788 0.364 0.28 0.416 0.744 0.124 0.676 0.996 0.788 0.864 ...
## $uniformity
##
## Asymptotic one-sample Kolmogorov-Smirnov test
##
## data: simulationOutput$scaledResiduals
## D = 0.063636, p-value = 0.4741
## alternative hypothesis: two-sided
##
##
## $dispersion
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## dispersion = 0.96215, p-value = 0.744
## alternative hypothesis: two.sided
##
##
## $outliers
##
## DHARMa outlier test based on exact binomial test with approximate
## expectations
##
## data: simulationOutput
## outliers at both margin(s) = 2, observations = 176, p-value = 0.4095
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
## 0.001379164 0.040444565
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 )
## 0.01136364
## $uniformity
##
## Asymptotic one-sample Kolmogorov-Smirnov test
##
## data: simulationOutput$scaledResiduals
## D = 0.063636, p-value = 0.4741
## alternative hypothesis: two-sided
##
##
## $dispersion
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## dispersion = 0.96215, p-value = 0.744
## alternative hypothesis: two.sided
##
##
## $outliers
##
## DHARMa outlier test based on exact binomial test with approximate
## expectations
##
## data: simulationOutput
## outliers at both margin(s) = 2, observations = 176, p-value = 0.4095
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
## 0.001379164 0.040444565
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 )
## 0.01136364
First we need to update the model by adding in the missing random factor
nas.1.p2a <- glmmTMB(MgO2.hr_Net ~ 1+ REGION * poly(TEMPERATURE,2) + MASS_CENTERED + (1|FISH_ID),
family=gaussian(),
data = resp4,
REML=TRUE) ## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.268 0.2 0.648 0.472 0.664 0.8 1 0.756 0.488 0.812 0.844 0.4 0.224 0.352 0.792 0.172 0.596 0.996 0.876 0.828 ...
## $uniformity
##
## Asymptotic one-sample Kolmogorov-Smirnov test
##
## data: simulationOutput$scaledResiduals
## D = 0.060045, p-value = 0.5497
## alternative hypothesis: two-sided
##
##
## $dispersion
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## dispersion = 0.95158, p-value = 0.64
## alternative hypothesis: two.sided
##
##
## $outliers
##
## DHARMa outlier test based on exact binomial test with approximate
## expectations
##
## data: simulationOutput
## outliers at both margin(s) = 2, observations = 176, p-value = 0.4095
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
## 0.001379164 0.040444565
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 )
## 0.01136364
## $uniformity
##
## Asymptotic one-sample Kolmogorov-Smirnov test
##
## data: simulationOutput$scaledResiduals
## D = 0.060045, p-value = 0.5497
## alternative hypothesis: two-sided
##
##
## $dispersion
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## dispersion = 0.95158, p-value = 0.64
## alternative hypothesis: two.sided
##
##
## $outliers
##
## DHARMa outlier test based on exact binomial test with approximate
## expectations
##
## data: simulationOutput
## outliers at both margin(s) = 2, observations = 176, p-value = 0.4095
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
## 0.001379164 0.040444565
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 )
## 0.01136364
It looks like the model that treats temperature as a second order polynomial does a better job at avoiding high levels of collinearity within the model. The quadratic model will be used moving forward because it:
| Estimate | StdError | Zvalue | Pvalue | |
|---|---|---|---|---|
| (Intercept) | 9.676637 | 0.3717003 | 26.033437 | 0.0000000 |
| REGIONLeading | -1.632353 | 0.6124654 | -2.665216 | 0.0076939 |
| poly(TEMPERATURE, 2)1 | 10.324322 | 3.0798654 | 3.352199 | 0.0008017 |
| poly(TEMPERATURE, 2)2 | -6.546341 | 3.0573845 | -2.141157 | 0.0322613 |
| MASS_CENTERED | 179.593117 | 32.6469695 | 5.501066 | 0.0000000 |
| REGIONLeading:poly(TEMPERATURE, 2)1 | -14.500327 | 4.5325854 | -3.199129 | 0.0013784 |
| REGIONLeading:poly(TEMPERATURE, 2)2 | 4.897260 | 4.4894335 | 1.090841 | 0.2753427 |
| Chisq | Df | Pr(>Chisq) | |
|---|---|---|---|
| REGION | 6.537818 | 1 | 0.0105605 |
| poly(TEMPERATURE, 2) | 6.515869 | 2 | 0.0384678 |
| MASS_CENTERED | 30.261721 | 1 | 0.0000000 |
| REGION:poly(TEMPERATURE, 2) | 11.460373 | 2 | 0.0032465 |
| 2.5 % | 97.5 % | Estimate | |
|---|---|---|---|
| (Intercept) | 8.9481173 | 10.4051557 | 9.676637 |
| REGIONLeading | -2.8327627 | -0.4319423 | -1.632353 |
| poly(TEMPERATURE, 2)1 | 4.2878974 | 16.3607477 | 10.324322 |
| poly(TEMPERATURE, 2)2 | -12.5387048 | -0.5539778 | -6.546341 |
| MASS_CENTERED | 115.6062324 | 243.5800011 | 179.593117 |
| REGIONLeading:poly(TEMPERATURE, 2)1 | -23.3840309 | -5.6166225 | -14.500327 |
| REGIONLeading:poly(TEMPERATURE, 2)2 | -3.9018683 | 13.6963875 | 4.897260 |
| Std.Dev.(Intercept)|FISH_ID | 0.8790124 | 1.9917277 | 1.323160 |
| R2_conditional | R2_marginal | optional |
|---|---|---|
| 0.6031021 | 0.4622269 | FALSE |
nas.1.p2a %>% emtrends(var = "TEMPERATURE", type = "response") %>% pairs(by = "TEMPERATURE") %>% summary(by = NULL, adjust = "tukey", infer=TRUE)SCROLL TO THE RIGHT –>
The numbers in the left most column in the table just mention that the slopes are assuming mean MASS_CENTERED values when looking at differences between latitudinal slopes.
nas.1.p2a %>% emmeans(pairwise ~ TEMPERATURE*REGION, type = "response") %>% pairs(by = "TEMPERATURE") %>% summary(by = NULL, adjust = "tukey", infer=TRUE)nas.1.p2a %>% update(.~1+ REGION * as.factor(TEMPERATURE) + MASS_CENTERED + RESTING_RUNTIME_SECONDS + (1|FISH_ID)) %>%
emmeans(~REGION*TEMPERATURE, type = "response") %>% summary(infer=TRUE)nas.1.p2a %>% update(.~1+ REGION * as.factor(TEMPERATURE) + MASS_CENTERED + RESTING_RUNTIME_SECONDS + (1|FISH_ID)) %>%
emmeans(~REGION*TEMPERATURE, type = "response") %>% pairs(by ="REGION") %>% summary(infer=TRUE)aas.emm <- nas.1.p2a %>% emmeans(~REGION*TEMPERATURE)
eff_size(aas.emm, sigma = sigma(nas.1.p2a), edf=df.residual(nas.1.p2a))## contrast
## Core TEMPERATURE29.0965909090909 - Leading TEMPERATURE29.0965909090909
## effect.size SE df lower.CL upper.CL
## 0.941 0.336 174 0.278 1.61
##
## sigma used for effect sizes: 2.221
## Confidence level used: 0.95
## Warning: Removed 4 rows containing missing values (`geom_point()`).
For initial details at the top of this document. In brief, Acanthochromis polyacanthus from two different regions on the Great Barrier Reef (GBR) were tested for metabolic performance at four different temperatures, 27\(^\circ\)C, 28.5\(^\circ\)C, 30\(^\circ\)C, and 31.5\(^\circ\)C. Fish used in this study were collected from two different regions, low- (i.e. Cairns) and high-latitude (i.e., Mackay), within each region fish were collected from a total of three different populations. After metabolic performance was tested blood and tissue samples were collected. White muscle tissue samples were used to look at the relationship between activity and temperature in two different enzymes, Lactate Dehydrogenase (LDH; anaerobic) and Citrate Synthase (CS: aerobic). Enzyme activity was measured over four different temperatures including 20\(^\circ\)C, 30\(^\circ\)C, 40\(^\circ\)C, and 50\(^\circ\)C. Enzyme activity was measured using a spectrometer and wavelength absorption levels were recorded using the software program LabX.
Before beginning always make sure that you are working in the correct directory
Now we can import that data. Two different data frames are being imported. The first has all the enzyme wave length absorption data for each sample and the tissue.mass data file contained information pertaining to the tissue samples that was used for each sample. Later on these two data frames will be merged.
ldh <- read_delim("./enzymes/LDH_LocalAdapt.txt", delim = "\t",
escape_double = FALSE, col_types = cols(`Creation time` = col_datetime(format = "%d/%m/%Y %H:%M")),
trim_ws = TRUE)
tissue.mass <- read.delim("C:/Users/jc527762/OneDrive - James Cook University/PhD dissertation/Data/Chapter1_LocalAdaptation/enzymes/tissue_mass.txt") Before the data can be analysed it is important to clean up the data file. I won’t explain step, that can be figured out by examining the different functions. The main steps that are occurring below are columns being broken up to make new columns (this is the separate function), or columns being combined to make a unique_sample_Id value. Time data can also be tricky to deal with in R, so there are a number of data manipulation steps being used to make sure that time is being read properly.
#--- data preparation/manipulation ---#
ldh2 <- ldh %>%
clean_names() %>%
mutate(muscle_type = str_replace(muscle_type, " ", ".")) %>%
unite("UNIQUE_SAMPLE_ID", c(fish_id,temperature,sample_index), sep="_", remove = FALSE) %>%
separate(creation_time, into=c('DATE','TIME'), sep = " ", remove = FALSE) %>%
arrange(sample_id_1, DATE, TIME)
ldh3 <- ldh2 %>%
mutate(TIME = hms(ldh2$TIME)) %>%
mutate(TIME = chron(times=ldh2$TIME)) %>%
arrange(TIME) %>%
group_by(UNIQUE_SAMPLE_ID, sample_id_1) %>%
mutate(TIME_DIFF = TIME - first(TIME)) %>%
filter(TIME != first(TIME)) %>%
ungroup() %>%
mutate(TIME_DIFF_SECS = period_to_seconds(hms(TIME_DIFF))) %>%
mutate(MINUTES = TIME_DIFF_SECS/60) %>%
mutate(MINUTES = round(MINUTES, digits = 2)) %>%
dplyr::rename(CUVETTE = sample_id_1) %>%
mutate(REGION = substr(fish_id, 1, 1 ),
POPULATION = substr(fish_id, 2, 4),
SAMPLE_NO = substr(fish_id, 5, 7)) %>%
mutate(REGION = case_when( REGION =="L"~ "Leading",
REGION == "C" ~ "Core",
TRUE ~ "na")) Next select data points will be removed. Data points have been checked using previously written app script that allows the user to look at the plotted points of samples that were run. Because we are interested in the slope, points that plateaued were removed from the analysis, as it signifies that the reaction ran out of ‘fuel’ to use. For LDH ‘fuel’ would refer to NADH, for CS ‘fuel’ would refer to Oxaloacetic acid. Samples that were removed were placed into one of five different groups, that can be found below:
grp1 <- c("CSUD008_20_1","CSUD008_20_2","CSUD008_20_3","CSUD008_20_4","CSUD008_20_5","CSUD008_20_6",
"CVLA047_50_1","CVLA047_50_2","CVLA047_50_3","CVLA047_50_4","CVLA047_50_5","CVLA047_50_6",
"CVLA046_50_1","CVLA046_50_2","CVLA046_50_3","CVLA046_50_4","CVLA046_50_5","CVLA046_50_6")
grp2 <- c("LCKM180_30_1","LCKM180_30_2","LCKM180_30_3","LCKM180_30_4","LCKM180_30_5","LCKM180_30_6",
"LKES172_50_1","LKES172_50_2","CLKES172_50_3","LKES172_50_4","LKES172_50_5","LKES172_50_6",
"LCHA114_50_1","LCHA114_50_2","LCHA114_50_3","LCHA114_50_4","LCHA114_50_5","LCHA114_50_6",
"CSUD074_50_1","CSUD074_50_2","CSUD074_50_3","CSUD074_50_4","CSUD074_50_5","CSUD074_50_6")
grp3 <- c("LCKM165_50_1","LCKM165_50_2","LCKM165_50_3","LCKM165_50_4","LCKM165_50_5","LCKM165_50_6",
"LCKM163_50_1","LCKM163_50_2","CLCKM163_50_3","LCKM163_50_4","LCKM163_50_5","LCKM163_50_6",
"CTON068_50_1","CTON068_50_2","CTON068_50_3","CTON068_50_4","CTON068_50_5","CTON068_50_6",
"CVLA104_50_1","CVLA104_50_2","CVLA104_50_3","CVLA104_50_4","CVLA104_50_5","CVLA104_50_6")
grp4 <- c("LCHA135_50_1","LCHA135_50_2","LCHA135_50_3","LCHA135_50_4","LCHA135_50_5","LCHA135_50_6",
"CTON069_50_1","CTON069_50_2","CCTON069_50_3","CTON069_50_4","CTON069_50_5","CTON069_50_6",
"CVLA045_50_1","CVLA045_50_2","CVLA045_50_3","CVLA045_50_4","CVLA045_50_5","CVLA045_50_6")
grp5 <- c("CSUD014_50_1","CSUD014_50_2","CSUD014_50_3","CSUD014_50_4","CSUD014_50_5","CSUD014_50_6",
"CTON110_50_1","CTON110_50_2","CCTON110_50_3","CTON110_50_4","CTON110_50_5","CTON110_50_6") For some samples entire runs on certain cuvettes were poor. These samples were removed below, as well as samples from each grp outlined above:
ldh3.filtered <- ldh3 %>%
filter(!(UNIQUE_SAMPLE_ID %in% c("LCKM154_20_1",
"LKES143_30_3",
"LKES143_20_2",
"CSUD010_40_2"))) %>%
group_by(UNIQUE_SAMPLE_ID) %>%
arrange(UNIQUE_SAMPLE_ID, TIME) %>%
filter(!(UNIQUE_SAMPLE_ID %in% grp1 & row_number() > (n() - 1))) %>%
filter(!(UNIQUE_SAMPLE_ID %in% grp2 & row_number() > (n() - 2))) %>%
filter(!(UNIQUE_SAMPLE_ID %in% grp3 & row_number() > (n() - 3))) %>%
filter(!(UNIQUE_SAMPLE_ID %in% grp4 & row_number() > (n() - 4))) %>%
filter(!(UNIQUE_SAMPLE_ID %in% grp5 & row_number() > (n() - 5))) %>%
ungroup() %>%
mutate(UNIQUE_SAMPLE_ID = str_sub(UNIQUE_SAMPLE_ID, end = -3)) Great! Now we have all the data points that we want to keep. However, the data needs to manipulated in a way that we can obtain and pull out slopes from the absorption readings, and the calculate enzyme activity based on these slopes. This will involve a number of steps.
Step1 will produce a data frame that provides you with the slope that was obtained for cuvettes 1-3 for each sample run at each experimental temperature
LDH_activity <- ldh3.filtered %>%
group_by(UNIQUE_SAMPLE_ID, CUVETTE) %>%
do({
mod = lm(result ~ MINUTES, data = .)
data.frame(Intercept = coef(mod)[1],
Slope = coef(mod)[2],
r2 = summary(mod)$adj.r.squared)
}) %>%
ungroup() %>%
filter(CUVETTE != ("6"))%>%
filter(CUVETTE != ("4"))%>%
filter(CUVETTE != ("5")) %>%
filter(Slope <= 0)Step2 will calculate the mean slope for cuvette 1-3.
Step3 will calculate background activity level by measuring the slope from cuvette 5 (postive control)
LDH_background <- ldh3 %>%
group_by(UNIQUE_SAMPLE_ID, CUVETTE) %>%
do({
mod = lm(result ~ MINUTES, data = .)
data.frame(Intercept = coef(mod)[1],
Slope = coef(mod)[2])
}) %>%
ungroup() %>%
filter(CUVETTE == ("5")) %>%
dplyr::rename(Background = Slope) %>%
mutate(UNIQUE_SAMPLE_ID = str_sub(UNIQUE_SAMPLE_ID, end = -3)) Step4 will merge the data frames that you created with the mean slopes and the background slopes.
final_table <- LDH_activity %>%
full_join(distinct(LDH_activity_means[,c(1,6)]), by = "UNIQUE_SAMPLE_ID") %>%
full_join(LDH_background[,c(1,4)], by = "UNIQUE_SAMPLE_ID")
final_table$Mean[duplicated(final_table$Mean)] <- ""
final_table$Background[duplicated(final_table$Background)] <- ""
final_table <- final_table %>%
mutate(Mean = as.numeric(Mean),
Background = as.numeric(Background),
Background_perc = Background/Mean) Step5 is where enzyme activity levels are calculated. See further details in manuscript (doi: xxx). Within this step background activity level is taken into account and subtracted from slopes where background activity was >5% or more of the sample slope.
ldh.data <- final_table %>%
select(c(UNIQUE_SAMPLE_ID, Mean, Background, Background_perc)) %>%
mutate(Mean = as.numeric(Mean),
Background = as.numeric(Background),
Background_perc = as.numeric(Background_perc)) %>%
mutate(Background2 = case_when(Background_perc <= 0.05 ~ 0,
TRUE ~ Background),
LDH_ABSORBANCE = Mean - Background2) %>%
drop_na() %>%
inner_join(select(ldh3.filtered, c(UNIQUE_SAMPLE_ID, REGION, POPULATION, temperature, fish_id)), by ="UNIQUE_SAMPLE_ID") %>%
inner_join(tissue.mass, by = "fish_id") %>%
mutate(TISSUE_MASS_CENTERED = scale(TISSUE_MASS, center = TRUE, scale = FALSE)) %>%
distinct(UNIQUE_SAMPLE_ID, REGION, POPULATION, .keep_all = TRUE) %>%
mutate(PATH_LENGTH = 1,
EXTINCTION_COEFFICIENT = 6.22,
TISSUE_CONCENTRATION = 0.005,
ASSAY_VOL = 2.975,
SAMPLE_VOL = 0.025,
LDH_ACTIVITY = ((LDH_ABSORBANCE/(PATH_LENGTH*EXTINCTION_COEFFICIENT*TISSUE_CONCENTRATION))*(ASSAY_VOL/SAMPLE_VOL))*-1) %>%
filter(LDH_ACTIVITY >=0) %>%
filter(fish_id != "CVLA047")By the end of this stage you should have a data frame that included a column called LDH_ACTIVITY along with necessary metadata - this data frame will be used to perform the statistical analysis.
ggplot(ldh.data, aes(x =as.numeric(temperature), y= LDH_ACTIVITY, color = REGION)) +
geom_point() + geom_smooth(method = "lm", se=FALSE)The model was fit using the glm and later glmmTMB package in R. A number of different models were tested to determine which hypothesis and associated variables best predicted resting oxygen consumption. Model fit was examined using AICc, BIC, and r-squared values. Additional model were examined via the validation diagonistics provided by the performance and dHARMA packages in R.
#--- base model ---#
ldh.model.1 <- glm(LDH_ACTIVITY ~ 1 + REGION*temperature + TISSUE_MASS_CENTERED,
family=gaussian(),
data = ldh.data) summary
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | -98.8473323 | 12.7538631 | -7.7503836 | 0.0000000 |
| REGIONLeading | -15.7431756 | 19.0212806 | -0.8276612 | 0.4092512 |
| temperature | 6.3664611 | 0.3464135 | 18.3782116 | 0.0000000 |
| TISSUE_MASS_CENTERED | -1.4109245 | 0.6197670 | -2.2765402 | 0.0243089 |
| REGIONLeading:temperature | 0.4104065 | 0.5148700 | 0.7971070 | 0.4267197 |
summary
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | -100.6938247 | 12.9128467 | -7.7979571 | 0.0000000 |
| REGIONLeading | -12.6091234 | 19.2468460 | -0.6551267 | 0.5134386 |
| temperature | 6.3664611 | 0.3514432 | 18.1151935 | 0.0000000 |
| REGIONLeading:temperature | 0.4183038 | 0.5223337 | 0.8008364 | 0.4245549 |
| model | df | AICc | BIC | r2 |
|---|---|---|---|---|
| ldh.model.1 | 6 | 1466.925 | 1484.268 | 0.818999 |
| ldh.model.2 | 5 | 1470.020 | 1484.547 | 0.813494 |
The model that contains TISSUE_MASS_CENTERED seems to do better than the model that leaves TISSUE_MASS_CENTERED out. Therefore we will move ahead with the model that contains TISSUE_MASS_CENTERED as a co-variate.
Note that the linear model has already been created via model ldh.model.1 in the previous section.
#--- second order polynomial ---#
ldh.model.1.p2 <- glm(LDH_ACTIVITY ~ 1 + REGION*poly(temperature, 2) + TISSUE_MASS_CENTERED,
family=gaussian(),
data = ldh.data)
#--- third order polynomial ---#
ldh.model.1.p3 <- glm(LDH_ACTIVITY ~ 1 + REGION*poly(temperature, 3) + TISSUE_MASS_CENTERED,
family=gaussian(),
data = ldh.data) | model | df | AICc | BIC | r2 |
|---|---|---|---|---|
| ldh.model.1 | 6 | 1466.925 | 1484.268 | 0.8230806 |
| ldh.model.1.p2 | 8 | 1461.007 | 1483.887 | 0.8351212 |
| ldh.model.1.p3 | 10 | 1460.160 | 1488.447 | 0.8410910 |
From our model comparison we can see that the model that runs temperature as a second order polynomial performs the best. Therefore, moving forward we will use the second order polynomial model.
Fish were repeatedly sampled over four different temperatures, therefore repeated sampling needs to be accounted for. To do this random factors will be included within the model. There are a number of options that can be used for random factors including 1) accounting for repeated sampling of individuals, 2) accounting for repeated sampling of individuals nested within population, 3) account for repeated sampling of individuals and populations without nesting. All three models will be run a compaired.
ldh.model.1.p2a <- glmmTMB(LDH_ACTIVITY ~ 1 + REGION*poly(temperature, 2) + TISSUE_MASS_CENTERED + (1|fish_id),
family=gaussian(),
data = ldh.data,
REML = TRUE)
ldh.model.1.p2b <- glmmTMB(LDH_ACTIVITY ~ 1 + REGION*poly(temperature, 2) + TISSUE_MASS_CENTERED + (1|POPULATION/fish_id),
family=gaussian(),
data = ldh.data,
REML = TRUE)
ldh.model.1.p2c <- glmmTMB(LDH_ACTIVITY ~ 1 + REGION*poly(temperature, 2) + TISSUE_MASS_CENTERED + (1|fish_id) + (1 + REGION|POPULATION),
family=gaussian(),
data = ldh.data,
REML = TRUE) # convergence problem| model | df | AICc | BIC | r2m | r2c |
|---|---|---|---|---|---|
| ldh.model.1.p2a | 9 | 1343.136 | 1368.736 | 0.8267269 | 0.8267269 |
| ldh.model.1.p2b | 10 | 1345.439 | 1373.726 | 0.8267284 | 0.8267284 |
| ldh.model.1.p2c | 12 | NA | NA | 0.8267268 | 0.8267268 |
Model ldh.model.1.p2a appears to be the best model, however, there seems to be little difference in how the models change depending on how the random factors are arranged.
The ldh.model.1.p2a model looks like it performs well.
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.228 0.048 0.004 0.012 0.42 0.232 0.516 0.58 0.472 0.688 0.712 0.672 0.828 0.836 0.956 0.776 0.368 0.212 0.952 0.264 ...
## $uniformity
##
## Asymptotic one-sample Kolmogorov-Smirnov test
##
## data: simulationOutput$scaledResiduals
## D = 0.091946, p-value = 0.1665
## alternative hypothesis: two-sided
##
##
## $dispersion
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## dispersion = 0.94013, p-value = 0.784
## alternative hypothesis: two.sided
##
##
## $outliers
##
## DHARMa outlier test based on exact binomial test with approximate
## expectations
##
## data: simulationOutput
## outliers at both margin(s) = 1, observations = 147, p-value = 1
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
## 0.0001722152 0.0373181816
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 )
## 0.006802721
## $uniformity
##
## Asymptotic one-sample Kolmogorov-Smirnov test
##
## data: simulationOutput$scaledResiduals
## D = 0.091946, p-value = 0.1665
## alternative hypothesis: two-sided
##
##
## $dispersion
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## dispersion = 0.94013, p-value = 0.784
## alternative hypothesis: two.sided
##
##
## $outliers
##
## DHARMa outlier test based on exact binomial test with approximate
## expectations
##
## data: simulationOutput
## outliers at both margin(s) = 1, observations = 147, p-value = 1
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
## 0.0001722152 0.0373181816
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 )
## 0.006802721
The model performs well and passes validation checks.
| Estimate | StdError | Zvalue | Pvalue | |
|---|---|---|---|---|
| (Intercept) | 124.571841 | 6.683209 | 18.6395254 | 0.0000000 |
| REGIONLeading | -1.451179 | 9.992478 | -0.1452272 | 0.8845315 |
| poly(temperature, 2)1 | 861.389304 | 27.180290 | 31.6916893 | 0.0000000 |
| poly(temperature, 2)2 | 84.094574 | 27.252744 | 3.0857288 | 0.0020305 |
| TISSUE_MASS_CENTERED | -1.403728 | 1.045435 | -1.3427216 | 0.1793622 |
| REGIONLeading:poly(temperature, 2)1 | 57.643663 | 40.481701 | 1.4239437 | 0.1544628 |
| REGIONLeading:poly(temperature, 2)2 | 42.994558 | 40.422380 | 1.0636325 | 0.2874952 |
| Chisq | Df | Pr(>Chisq) | |
|---|---|---|---|
| REGION | 0.0209293 | 1 | 0.8849717 |
| poly(temperature, 2) | 1970.2750833 | 2 | 0.0000000 |
| TISSUE_MASS_CENTERED | 1.8029012 | 1 | 0.1793622 |
| REGION:poly(temperature, 2) | 3.1744294 | 2 | 0.2044944 |
| 2.5 % | 97.5 % | Estimate | |
|---|---|---|---|
| (Intercept) | 111.472992 | 137.6706893 | 124.571841 |
| REGIONLeading | -21.036076 | 18.1337176 | -1.451179 |
| poly(temperature, 2)1 | 808.116915 | 914.6616938 | 861.389304 |
| poly(temperature, 2)2 | 30.680179 | 137.5089698 | 84.094574 |
| TISSUE_MASS_CENTERED | -3.452743 | 0.6452869 | -1.403728 |
| REGIONLeading:poly(temperature, 2)1 | -21.699014 | 136.9863399 | 57.643663 |
| REGIONLeading:poly(temperature, 2)2 | -36.231850 | 122.2209668 | 42.994558 |
| Std.Dev.(Intercept)|fish_id | 20.967247 | 35.9986249 | 27.473480 |
| R2_conditional | R2_marginal | optional |
|---|---|---|
| 0.9395659 | 0.8267269 | FALSE |
ldh.model.1.p2a %>% emtrends(var = "temperature", type = "response") %>% pairs(by = "temperature") %>% summary(by = NULL, adjust = "tukey", infer=TRUE)SCROLL TO THE RIGHT –>
The numbers in the left most column in the table just mention that the slopes are assuming mean TISSUE_MASS_CENTERED values when looking at differences between latitudinal slopes.
ldh.model.1.p2a %>% emmeans(pairwise ~ temperature*REGION, type = "response") %>% pairs(by = "temperature") %>% summary(by = NULL, adjust = "tukey", infer=TRUE)ldh.model.1.p2a %>% update(.~1+ REGION * as.factor(temperature) + TISSUE_MASS_CENTERED + (1|fish_id)) %>%
emmeans(~REGION*temperature, type = "response") %>% summary(infer=TRUE)ldh.model.1.p2a %>% update(.~1+ REGION * as.factor(temperature) + TISSUE_MASS_CENTERED + (1|fish_id)) %>%
emmeans(~REGION*temperature, type = "response") %>% pairs(by ="REGION") %>% summary(infer=TRUE)ldh.emm <- ldh.model.1.p2a %>% emmeans(~REGION*temperature)
eff_size(ldh.emm, sigma = sigma(ldh.model.1.p2a), edf=df.residual(ldh.model.1.p2a))## contrast
## Core temperature35.1020408163265 - Leading temperature35.1020408163265
## effect.size SE df lower.CL upper.CL
## 0.291 0.538 145 -0.772 1.36
##
## sigma used for effect sizes: 20.11
## Confidence level used: 0.95
## Warning: Removed 7 rows containing missing values (`geom_point()`).
For initial details on the experiment performed please read the ReadMe file. In brief, Acanthochromis polyacanthus from two different regions on the Great Barrier Reef (GBR) were tested for metabolic performance at four different temperatures, 27\(^\circ\)C, 28.5\(^\circ\)C, 30\(^\circ\)C, and 31.5\(^\circ\)C. Fish used in this study were collected from two different regions, low- (i.e. Cairns) and high-latitude (i.e., Mackay), within each region fish were collected from a total of three different populations. After metabolic performance was tested blood and tissue samples were collected. White muscle tissue samples were used to look at the relationship between activity and temperature in two different enzymes, Lactate Dehydrogenase (LDH; anaerobic) and Citrate Synthase (CS: aerobic). Enzyme activity was measured over four different temperatures including 20\(^\circ\)C, 30\(^\circ\)C, 40\(^\circ\)C, and 50\(^\circ\)C. Enzyme activity was measured using a spectophotometer and wavelength absoprtion levels were recorded using the software program LabX.
Before beginning always make sure that you are working in the correct directory
Now we can import that data. Two different data frames are being imported. The first has all the enzyme wave length absorption data for each sample and the tissue.mass data file contained information pertaining to the tissue samples that was used for each sample. Later on these two data frames will be merged.
cs <- read_delim("C:/Users/jc527762/OneDrive - James Cook University/PhD dissertation/Data/Chapter1_LocalAdaptation/enzymes/CS_LocalAdapt6.txt",
delim = "\t", escape_double = FALSE,
col_types = cols(...21 = col_skip(),
...22 = col_skip()), trim_ws = TRUE) %>%
clean_names() %>%
mutate(creation_time = as.POSIXct(creation_time, format = "%d/%m/%Y %H:%M:%S"))
tissue.mass <- read.delim("C:/Users/jc527762/OneDrive - James Cook University/PhD dissertation/Data/Chapter1_LocalAdaptation/enzymes/tissue_mass.txt") %>%
dplyr::rename(FISH_ID = fish_id)Before the data can be analysed it is important to clean up the data file. I won’t explain step, that can be figured out by examining the different functions. The main steps that are occurring below are columns being broken up to make new columns (this is the separate function), or columns being combined to make a unique_sample_Id value. Time data can also be tricky to deal with in R, so there are a number of data manipulation steps being used to make sure that time is being read properly.
#--- data preparation/manipulation ---#
cs2 <- cs %>%
mutate(muscle_type = str_replace(muscle_type, " ", ".")) %>%
unite("UNIQUE_SAMPLE_ID", c(fish_id,temperature,sample_index), sep="_", remove = FALSE) %>%
separate(creation_time, into=c('DATE','TIME'), sep = " ", remove = FALSE) %>%
arrange(sample_id_1, DATE, TIME)
cs3 <- cs2 %>%
mutate(DATE = as.Date(creation_time),
TIME = format(creation_time, "%H:%M:%S")) %>%
mutate(TIME = hms(cs2$TIME)) %>%
mutate(TIME = chron(times=cs2$TIME)) %>%
arrange(TIME) %>%
group_by(UNIQUE_SAMPLE_ID, sample_id_1) %>%
mutate(TIME_DIFF = TIME - first(TIME)) %>%
filter(TIME != first(TIME)) %>%
ungroup() %>%
mutate(TIME_DIFF_SECS = period_to_seconds(hms(TIME_DIFF))) %>%
mutate(MINUTES = TIME_DIFF_SECS/60) %>%
mutate(MINUTES = round(MINUTES, digits = 2)) %>%
dplyr::rename(CUVETTE = sample_id_1) %>%
mutate(REGION = substr(fish_id, 1, 1 ),
POPULATION = substr(fish_id, 2, 4),
SAMPLE_NO = substr(fish_id, 5, 7)) %>%
mutate(REGION = case_when( REGION =="L"~ "Leading",
REGION == "C" ~ "Core",
TRUE ~ "na")) Next select data points will be removed. Data points have been checked using previously written app script that allows the user to look at the plotted points of samples that were run. Because we are interested in the slope, points that plateaued were removed from the analysis, as it signifies that the reaction ran out of ‘fuel’ to use. For LDH ‘fuel’ would refer to NADH, for CS ‘fuel’ would refer to Oxaloacetic acid. Samples that were removed were placed into one of five different groups. No reactions reached plateau for CS, however there were a number of runs and/or cuvettes where data quailty was poor.
cs3.filtered <- cs3 %>%
dplyr::rename(TEMPERATURE = temperature,
FISH_ID = fish_id) %>%
filter(!(TEMPERATURE == "50" & FISH_ID == "LCKM158")) %>%
filter(!(TEMPERATURE == "50" & FISH_ID == "CSUD010")) %>%
filter(!(TEMPERATURE == "40" & FISH_ID == "CSUD018")) %>%
filter(!(TEMPERATURE == "20" & FISH_ID == "CTON061")) %>%
filter(!(TEMPERATURE == "50" & FISH_ID == "CTON065")) %>%
filter(!(FISH_ID == "CTON069")) %>%
filter(!(UNIQUE_SAMPLE_ID %in% c("CTON060_50_3",
"CTON061_30_3",
"CTON061_40_3",
"CTON061_50_2",
"CTON061_50_3")))%>%
mutate(UNIQUE_SAMPLE_ID = str_sub(UNIQUE_SAMPLE_ID, end = -3))Great! Now we have all the data points that we want to keep. However, the data needs to manipulated in a way that we can obtain and pull out slopes from the absorption readings, and the calculate enzyme activity based on these slopes. This will involve a number of steps.
Step1 will produce a data frame that provides you with the slope that was obtained for cuvettes 1-3 for each sample run at each experimental temperature
CS_activity <- cs3.filtered %>%
dplyr::group_by(UNIQUE_SAMPLE_ID, CUVETTE) %>%
do({
mod = lm(result ~ MINUTES, data = .)
data.frame(Intercept = coef(mod)[1],
Slope = coef(mod)[2],
r2 = summary(mod)$adj.r.squared)
}) %>%
dplyr::ungroup() %>%
filter(CUVETTE != ("6"))%>%
filter(CUVETTE != ("4"))%>%
filter(CUVETTE != ("5"))Step2 will calculate the mean slope for cuvette 1-3.
Step3 will calculate background activity level by measuring the slope from cuvette 5 (postive control)
Step4 will merge the data frames that you created with the mean slopes and the background slopes.
final_table <- CS_activity %>%
full_join(distinct(CS_activity_means[,c(1,6)]), by = "UNIQUE_SAMPLE_ID") %>%
full_join(CS_background[,c(1,4)], by = "UNIQUE_SAMPLE_ID")
final_table$Mean[duplicated(final_table$Mean)] <- ""
final_table$Background[duplicated(final_table$Background)] <- ""
final_table <- final_table %>%
mutate(Mean = as.numeric(Mean),
Background = as.numeric(Background),
Background_perc = Background/Mean) Step5 is where enzyme activity levels are calculated. See further details in manuscript (doi: xxx). Within this step background activity level is taken into account and subtracted from slopes where background activity was >5% or more of the sample slope.
CS.data <- final_table %>%
select(c(UNIQUE_SAMPLE_ID, Mean, Background, Background_perc)) %>%
mutate(Mean = as.numeric(Mean),
Background = as.numeric(Background),
Background_perc = as.numeric(Background_perc)) %>%
mutate(Background2 = case_when(Background_perc <= 0.05 ~ 0,
TRUE ~ Background),
CS_ABSORBANCE = Mean - Background2) %>%
inner_join(select(cs3.filtered, c(UNIQUE_SAMPLE_ID, REGION, POPULATION, TEMPERATURE, FISH_ID)), by ="UNIQUE_SAMPLE_ID") %>%
inner_join(tissue.mass, by = "FISH_ID") %>%
mutate(TISSUE_MASS_CENTERED = scale(TISSUE_MASS, center = TRUE, scale = FALSE)) %>%
distinct(UNIQUE_SAMPLE_ID, REGION, POPULATION, .keep_all = TRUE) %>%
mutate(REGION = factor(REGION),
PATH_LENGTH = 1,
EXTINCTION_COEFFICIENT = 13.6,
TISSUE_CONCENTRATION = 0.01,
ASSAY_VOL = 0.930,
SAMPLE_VOL = 0.020,
CS_ACTIVITY = ((CS_ABSORBANCE/(PATH_LENGTH*EXTINCTION_COEFFICIENT*TISSUE_CONCENTRATION))*(ASSAY_VOL/SAMPLE_VOL))) %>%
filter(FISH_ID != "CVLA047")By the end of this stage you should have a data frame that included a column called LDH_ACTIVITY along with necessary metadata - this data frame will be used to perform the statistical analysis.
ggplot(CS.data, aes(x =as.numeric(TEMPERATURE), y= CS_ACTIVITY, color = REGION)) +
geom_point() + geom_smooth(method = "lm", se=FALSE)ggplot(CS.data, aes(x = CS_ACTIVITY, fill = TEMPERATURE, color = TEMPERATURE)) +
geom_density(alpha =0.5, position = "identity") ggplot(CS.data, aes(x =TISSUE_MASS_CENTERED, y= CS_ACTIVITY, color = REGION)) +
geom_point() + geom_smooth(method = "lm", se=FALSE)The model was fit using the glm and later glmmTMB package in R. A number of different models were tested to determine which hypothesis and associated variables best predicted resting oxygen consumption. Model fit was examined using AICc, BIC, and r-squared values. Additional model were examined via the validation diagonistics provided by the performance and dHARMA packages in R.
#--- base model ---#
cs.model.1 <- glm(CS_ACTIVITY ~ 1 + REGION*TEMPERATURE + TISSUE_MASS_CENTERED,
family=gaussian(),
data = CS.data) summary
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | -1.3917427 | 0.4681525 | -2.9728401 | 0.0035409 |
| REGIONLeading | -0.6551419 | 0.6711737 | -0.9761138 | 0.3308934 |
| TEMPERATURE | 0.1491176 | 0.0128965 | 11.5626489 | 0.0000000 |
| TISSUE_MASS_CENTERED | 0.0065822 | 0.0217986 | 0.3019550 | 0.7631882 |
| REGIONLeading:TEMPERATURE | 0.0372828 | 0.0183990 | 2.0263512 | 0.0448568 |
summary
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | -1.3792002 | 0.4646214 | -2.968439 | 0.0035836 |
| REGIONLeading | -0.6734697 | 0.6660086 | -1.011203 | 0.3138573 |
| TEMPERATURE | 0.1489822 | 0.0128421 | 11.601054 | 0.0000000 |
| REGIONLeading:TEMPERATURE | 0.0374138 | 0.0183274 | 2.041412 | 0.0432978 |
| model | df | AICc | BIC | r2 |
|---|---|---|---|---|
| cs.model.1 | 6 | 414.3969 | 430.9192 | 0.7286043 |
| cs.model.2 | 5 | 412.2926 | 426.1464 | 0.7299815 |
The model that contains TISSUE_MASS_CENTERED seems to do better than the model that leaves TISSUE_MASS_CENTERED out. Therefore we will move ahead with the model that contains TISSUE_MASS_CENTERED as a co-variate.
Note that the linear model has already been created via model cs.model.1 in the previous section.
##--- second order polynomial ---#
cs.model.1.p2 <- glm(CS_ACTIVITY ~ 1 + REGION*poly(TEMPERATURE, 2) + TISSUE_MASS_CENTERED,
family=gaussian(),
data = CS.data)
#--- third order polynomial ---#
cs.model.1.p3 <- glm(CS_ACTIVITY ~ 1 + REGION*poly(TEMPERATURE, 3) + TISSUE_MASS_CENTERED,
family=gaussian(),
data = CS.data) | model | df | AICc | BIC | r2 |
|---|---|---|---|---|
| cs.model.1 | 6 | 414.3969 | 430.9192 | 0.7347879 |
| cs.model.1.p2 | 8 | 417.7056 | 439.4558 | 0.7372215 |
| cs.model.1.p3 | 10 | 421.9615 | 448.7881 | 0.7380345 |
From our model comparison we can see that the model that runs temperature as a linear model performs the best. Therefore, moving forward we will use the linear model.
Fish were repeatedly sampled over four different temperatures, therefore repeated sampling needs to be accounted for. To do this random factors will be included within the model. There are a number of options that can be used for random factors including 1) accounting for repeated sampling of individuals, 2) accounting for repeated sampling of individuals nested within population, 3) account for repeated sampling of individuals and populations without nesting. All three models will be run a compaired.
cs.model.1a <- glmmTMB(CS_ACTIVITY ~ 1 + REGION*TEMPERATURE + TISSUE_MASS_CENTERED + (1|FISH_ID),
family=gaussian(),
data = CS.data,
REML = TRUE)
cs.model.1b <- glmmTMB(CS_ACTIVITY ~ 1 + REGION*TEMPERATURE + TISSUE_MASS_CENTERED + (1|POPULATION/FISH_ID),
family=gaussian(),
data = CS.data,
REML = TRUE)
cs.model.1c <- glmmTMB(CS_ACTIVITY ~ 1 + REGION*TEMPERATURE + TISSUE_MASS_CENTERED + (1|FISH_ID) + (1 + REGION|POPULATION),
family=gaussian(),
data = CS.data,
REML = TRUE)| model | df | AICc | BIC | r2m | r2c |
|---|---|---|---|---|---|
| cs.model.1a | 7 | 368.4369 | 387.5916 | 0.7237186 | 0.7237186 |
| cs.model.1b | 8 | 369.5447 | 391.2949 | 0.7114656 | 0.7114656 |
| cs.model.1c | 10 | 373.7411 | 400.5677 | 0.7110716 | 0.7110716 |
Model cs.model.1a appears to be the best model, however, there seems to be little difference in how the models change depending on how the random factors are arranged.
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.724 0.74 0.66 0.72 0.156 0.196 0.08 0.068 0.36 0.464 0.792 0.784 0.588 0.416 0.04 0.484 0.828 0.64 0.2 0.168 ...
## $uniformity
##
## Asymptotic one-sample Kolmogorov-Smirnov test
##
## data: simulationOutput$scaledResiduals
## D = 0.065231, p-value = 0.6377
## alternative hypothesis: two-sided
##
##
## $dispersion
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## dispersion = 0.97307, p-value = 0.984
## alternative hypothesis: two.sided
##
##
## $outliers
##
## DHARMa outlier test based on exact binomial test with approximate
## expectations
##
## data: simulationOutput
## outliers at both margin(s) = 1, observations = 130, p-value = 1
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
## 0.0001947334 0.0421127390
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 )
## 0.007692308
## $uniformity
##
## Asymptotic one-sample Kolmogorov-Smirnov test
##
## data: simulationOutput$scaledResiduals
## D = 0.065231, p-value = 0.6377
## alternative hypothesis: two-sided
##
##
## $dispersion
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## dispersion = 0.97307, p-value = 0.984
## alternative hypothesis: two.sided
##
##
## $outliers
##
## DHARMa outlier test based on exact binomial test with approximate
## expectations
##
## data: simulationOutput
## outliers at both margin(s) = 1, observations = 130, p-value = 1
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
## 0.0001947334 0.0421127390
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 )
## 0.007692308
The cs.model.1a model looks okay….lets play around with a link transformation to see if we can get any improvement
cs.model.1a <- glmmTMB(CS_ACTIVITY ~ 1 + REGION*TEMPERATURE + TISSUE_MASS_CENTERED + (1|FISH_ID), #deafult
family=gaussian(link = "identity"),
data = CS.data,
REML = TRUE)
cs.model.1a.log <- glmmTMB(CS_ACTIVITY ~ 1 + REGION*TEMPERATURE + TISSUE_MASS_CENTERED + (1|FISH_ID),
family=gaussian(link="log"),
data = CS.data,
REML = TRUE)
cs.model.1a.inv <- glmmTMB(CS_ACTIVITY ~ 1 + REGION*TEMPERATURE + TISSUE_MASS_CENTERED + (1|FISH_ID),
family=gaussian(link="inverse"),
data = CS.data,
REML = TRUE) ## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.724 0.74 0.66 0.72 0.156 0.196 0.08 0.068 0.36 0.464 0.792 0.784 0.588 0.416 0.04 0.484 0.828 0.64 0.2 0.168 ...
## $uniformity
##
## Asymptotic one-sample Kolmogorov-Smirnov test
##
## data: simulationOutput$scaledResiduals
## D = 0.065231, p-value = 0.6377
## alternative hypothesis: two-sided
##
##
## $dispersion
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## dispersion = 0.97307, p-value = 0.984
## alternative hypothesis: two.sided
##
##
## $outliers
##
## DHARMa outlier test based on exact binomial test with approximate
## expectations
##
## data: simulationOutput
## outliers at both margin(s) = 1, observations = 130, p-value = 1
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
## 0.0001947334 0.0421127390
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 )
## 0.007692308
## $uniformity
##
## Asymptotic one-sample Kolmogorov-Smirnov test
##
## data: simulationOutput$scaledResiduals
## D = 0.065231, p-value = 0.6377
## alternative hypothesis: two-sided
##
##
## $dispersion
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## dispersion = 0.97307, p-value = 0.984
## alternative hypothesis: two.sided
##
##
## $outliers
##
## DHARMa outlier test based on exact binomial test with approximate
## expectations
##
## data: simulationOutput
## outliers at both margin(s) = 1, observations = 130, p-value = 1
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
## 0.0001947334 0.0421127390
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 )
## 0.007692308
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.676 0.864 0.752 0.664 0.036 0.144 0.112 0.088 0.1 0.524 0.836 0.748 0.64 0.568 0.06 0.256 0.896 0.58 0.012 0.14 ...
## $uniformity
##
## Asymptotic one-sample Kolmogorov-Smirnov test
##
## data: simulationOutput$scaledResiduals
## D = 0.091692, p-value = 0.2244
## alternative hypothesis: two-sided
##
##
## $dispersion
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## dispersion = 0.98328, p-value = 0.944
## alternative hypothesis: two.sided
##
##
## $outliers
##
## DHARMa outlier test based on exact binomial test with approximate
## expectations
##
## data: simulationOutput
## outliers at both margin(s) = 0, observations = 130, p-value = 0.6309
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
## 0.00000000 0.02797718
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 )
## 0
## $uniformity
##
## Asymptotic one-sample Kolmogorov-Smirnov test
##
## data: simulationOutput$scaledResiduals
## D = 0.091692, p-value = 0.2244
## alternative hypothesis: two-sided
##
##
## $dispersion
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## dispersion = 0.98328, p-value = 0.944
## alternative hypothesis: two.sided
##
##
## $outliers
##
## DHARMa outlier test based on exact binomial test with approximate
## expectations
##
## data: simulationOutput
## outliers at both margin(s) = 0, observations = 130, p-value = 0.6309
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
## 0.00000000 0.02797718
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 )
## 0
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.512 0.856 0.864 0.652 0.032 0.104 0.136 0.08 0.064 0.396 0.848 0.516 0.6 0.564 0.048 0.148 0.888 0.524 0.008 0.144 ...
## $uniformity
##
## Asymptotic one-sample Kolmogorov-Smirnov test
##
## data: simulationOutput$scaledResiduals
## D = 0.11877, p-value = 0.05107
## alternative hypothesis: two-sided
##
##
## $dispersion
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## dispersion = 0.58039, p-value = 0.408
## alternative hypothesis: two.sided
##
##
## $outliers
##
## DHARMa outlier test based on exact binomial test with approximate
## expectations
##
## data: simulationOutput
## outliers at both margin(s) = 0, observations = 130, p-value = 0.6309
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
## 0.00000000 0.02797718
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 )
## 0
## $uniformity
##
## Asymptotic one-sample Kolmogorov-Smirnov test
##
## data: simulationOutput$scaledResiduals
## D = 0.11877, p-value = 0.05107
## alternative hypothesis: two-sided
##
##
## $dispersion
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## dispersion = 0.58039, p-value = 0.408
## alternative hypothesis: two.sided
##
##
## $outliers
##
## DHARMa outlier test based on exact binomial test with approximate
## expectations
##
## data: simulationOutput
## outliers at both margin(s) = 0, observations = 130, p-value = 0.6309
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
## 0.00000000 0.02797718
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 )
## 0
From looking at the different models it looks like the model with the log-link function performs the best. In the DHARMa validation test we can see that one of quantile deviations is violated. Because the model passes all the other data validations realtively well we could move on with the log-link model. However, previously we showed that the 2nd and 3rd order polynomials also performed quite well, and we know the LDH model was not linear. So before we choose out final model, lets see what the 2nd and 3rd order polynomials look like with a log-link.
cs.model.1a.log <- glmmTMB(CS_ACTIVITY ~ 1 + REGION*TEMPERATURE + TISSUE_MASS_CENTERED + (1|FISH_ID),
family=gaussian(link="log"),
data = CS.data,
REML = FALSE)
cs.model.1a.log.p2 <- glmmTMB(CS_ACTIVITY ~ 1 + REGION*poly(TEMPERATURE, 2) + TISSUE_MASS_CENTERED + (1|FISH_ID),
family=gaussian(link="log"),
data = CS.data,
REML = FALSE)
cs.model.1a.log.p3 <- glmmTMB(CS_ACTIVITY ~ 1 + REGION*poly(TEMPERATURE, 3) + TISSUE_MASS_CENTERED + (1|FISH_ID),
family=gaussian(link="log"),
data = CS.data,
REML = FALSE)| model | df | AICc | BIC | r2m | r2c |
|---|---|---|---|---|---|
| cs.model.1a.log | 7 | 319.7416 | 338.8963 | 0.3504577 | 0.3504577 |
| cs.model.1a.log.p2 | 9 | 296.4647 | 320.7725 | 0.4639599 | 0.4639599 |
| cs.model.1a.log.p3 | 11 | 300.4024 | 329.7080 | 0.4653824 | 0.4653824 |
From this model comparison we can see that the 2nd order polynomial with the log-link seems to be the best model. Let’s look at our model validations
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.84 0.864 0.696 0.724 0.04 0.096 0.064 0.1 0.164 0.484 0.8 0.9 0.624 0.444 0.072 0.36 0.896 0.636 0.036 0.072 ...
## $uniformity
##
## Asymptotic one-sample Kolmogorov-Smirnov test
##
## data: simulationOutput$scaledResiduals
## D = 0.070769, p-value = 0.533
## alternative hypothesis: two-sided
##
##
## $dispersion
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## dispersion = 1.087, p-value = 0.616
## alternative hypothesis: two.sided
##
##
## $outliers
##
## DHARMa outlier test based on exact binomial test with approximate
## expectations
##
## data: simulationOutput
## outliers at both margin(s) = 1, observations = 130, p-value = 1
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
## 0.0001947334 0.0421127390
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 )
## 0.007692308
## $uniformity
##
## Asymptotic one-sample Kolmogorov-Smirnov test
##
## data: simulationOutput$scaledResiduals
## D = 0.070769, p-value = 0.533
## alternative hypothesis: two-sided
##
##
## $dispersion
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## dispersion = 1.087, p-value = 0.616
## alternative hypothesis: two.sided
##
##
## $outliers
##
## DHARMa outlier test based on exact binomial test with approximate
## expectations
##
## data: simulationOutput
## outliers at both margin(s) = 1, observations = 130, p-value = 1
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
## 0.0001947334 0.0421127390
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 )
## 0.007692308
Validations look great! Moving ahead with the quadratic log-link model.
| Estimate | StdError | Zvalue | Pvalue | |
|---|---|---|---|---|
| (Intercept) | 1.1984073 | 0.0564498 | 21.2295969 | 0.0000000 |
| REGIONLeading | 0.1356647 | 0.0820867 | 1.6526997 | 0.0983920 |
| poly(TEMPERATURE, 2)1 | 5.3642867 | 0.2711131 | 19.7861598 | 0.0000000 |
| poly(TEMPERATURE, 2)2 | -0.8451058 | 0.2110951 | -4.0034360 | 0.0000624 |
| TISSUE_MASS_CENTERED | 0.0024962 | 0.0082421 | 0.3028589 | 0.7619974 |
| REGIONLeading:poly(TEMPERATURE, 2)1 | 0.4346926 | 0.3715558 | 1.1699254 | 0.2420310 |
| REGIONLeading:poly(TEMPERATURE, 2)2 | 0.1150385 | 0.2844006 | 0.4044947 | 0.6858490 |
| Chisq | Df | Pr(>Chisq) | |
|---|---|---|---|
| REGION | 4.3163842 | 1 | 0.0377470 |
| poly(TEMPERATURE, 2) | 1234.7927952 | 2 | 0.0000000 |
| TISSUE_MASS_CENTERED | 0.0917235 | 1 | 0.7619974 |
| REGION:poly(TEMPERATURE, 2) | 3.5245189 | 2 | 0.1716566 |
| 2.5 % | 97.5 % | Estimate | |
|---|---|---|---|
| (Intercept) | 1.0877676 | 1.3090469 | 1.1984073 |
| REGIONLeading | -0.0252223 | 0.2965517 | 0.1356647 |
| poly(TEMPERATURE, 2)1 | 4.8329148 | 5.8956585 | 5.3642867 |
| poly(TEMPERATURE, 2)2 | -1.2588446 | -0.4313670 | -0.8451058 |
| TISSUE_MASS_CENTERED | -0.0136580 | 0.0186503 | 0.0024962 |
| REGIONLeading:poly(TEMPERATURE, 2)1 | -0.2935434 | 1.1629285 | 0.4346926 |
| REGIONLeading:poly(TEMPERATURE, 2)2 | -0.4423764 | 0.6724535 | 0.1150385 |
| Std.Dev.(Intercept)|FISH_ID | 0.1678718 | 0.2807799 | 0.2171060 |
| R2_conditional | R2_marginal | optional |
|---|---|---|
| 0.551021 | 0.4639599 | FALSE |
cs.model.1a.log.p2 %>% emtrends(var = "TEMPERATURE", type = "response") %>% pairs(by = "TEMPERATURE") %>% summary(by = NULL, adjust = "tukey", infer=TRUE)SCROLL TO THE RIGHT –>
The numbers in the left most column in the table just mention that the slopes are assuming mean TISSUE_MASS_CENTERED values when looking at differences between latitudinal slopes.
cs.model.1a.log.p2 %>% emmeans(pairwise ~ TEMPERATURE*REGION, type = "response") %>% pairs(by = "TEMPERATURE") %>% summary(by = NULL, adjust = "tukey", infer=TRUE)cs.model.1a.log.p2 %>% update(.~1+ REGION * as.factor(TEMPERATURE) + TISSUE_MASS_CENTERED + (1|FISH_ID)) %>%
emmeans(~REGION*TEMPERATURE, type = "response") %>% summary(infer=TRUE)cs.model.1a.log.p2 %>% update(.~1+ REGION * as.factor(TEMPERATURE) + TISSUE_MASS_CENTERED + (1|FISH_ID)) %>%
emmeans(~REGION*TEMPERATURE, type = "response") %>% pairs(by ="REGION") %>% summary(infer=TRUE)cs.emm <- cs.model.1a.log.p2 %>% emmeans(~REGION*TEMPERATURE)
eff_size(cs.emm, sigma = sigma(cs.model.1a.log.p2), edf=df.residual(cs.model.1a.log.p2))## contrast
## Core TEMPERATURE34.6153846153846 - Leading TEMPERATURE34.6153846153846
## effect.size SE df lower.CL upper.CL
## -0.25 0.17 121 -0.586 0.086
##
## sigma used for effect sizes: 0.493
## Confidence level used: 0.95
Before beginning always make sure that you are working in the correct directory
Now we can import that data. Two different data frames are being imported. The first has all the enzyme wave length absorption data for each sample and the tissue.mass data file contained information pertaining to the tissue samples that was used for each sample. Later on these two data frames will be merged.
Both LDH and CS dataframes have been previously cleaned and manipulated, therefore, the only remaining step is to join the data frames together and then make another column that has the LDH:CS ratio
ggplot(ldh.cs.data, aes(x =as.numeric(temperature), y= LCr, color = REGION)) +
geom_point() + geom_smooth(method = "lm", se=FALSE)The model was fit using the glm and later glmmTMB package in R. A number of different models were tested to determine which hypothesis and associated variables best predicted resting oxygen consumption. Model fit was examined using AICc, BIC, and r-squared values. Additional model were examined via the validation diagnostics provided by the performance and dHARMA packages in R.
#--- base model ---#
ldh.cs.model.1 <- glm(LCr~ 1 + REGION*temperature + TISSUE_MASS_CENTERED,
family=gaussian(),
data = ldh.cs.data) summary
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 17.0639389 | 5.1189094 | 3.3335106 | 0.0011310 |
| REGIONLeading | -5.5908886 | 7.4105238 | -0.7544526 | 0.4520079 |
| temperature | 0.4182537 | 0.1410369 | 2.9655625 | 0.0036250 |
| TISSUE_MASS_CENTERED | -0.7838687 | 0.2388626 | -3.2816715 | 0.0013405 |
| REGIONLeading:temperature | -0.0009074 | 0.2026775 | -0.0044771 | 0.9964350 |
summary
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 15.6040768 | 5.2950718 | 2.9469056 | 0.0038303 |
| REGIONLeading | -3.6272090 | 7.6695352 | -0.4729373 | 0.6370827 |
| temperature | 0.4343827 | 0.1463556 | 2.9679943 | 0.0035934 |
| REGIONLeading:temperature | -0.0114299 | 0.2104223 | -0.0543190 | 0.9567677 |
| model | df | AICc | BIC | r2 |
|---|---|---|---|---|
| ldh.cs.model.1 | 6 | 1028.425 | 1044.895 | 0.1980466 |
| ldh.cs.model.2 | 5 | 1036.968 | 1050.779 | 0.1312030 |
The model that contains TISSUE_MASS_CENTERED seems to do better than the model that leaves TISSUE_MASS_CENTERED out. Therefore we will move ahead with the model that contains TISSUE_MASS_CENTERED as a co-variate.
Note that the linear model has already been created via model ldh.cs.model.1 in the previous section.
#--- second order polynomial ---#
ldh.cs.model.1.p2 <- glm(LCr ~ 1 + REGION*poly(temperature, 2) + TISSUE_MASS_CENTERED,
family=gaussian(),
data = ldh.cs.data)
#--- third order polynomial ---#
ldh.cs.model.1.p3 <- glm(LCr ~ 1 + REGION*poly(temperature, 3) + TISSUE_MASS_CENTERED,
family=gaussian(),
data = ldh.cs.data) polynomial model comparisons
| model | df | AICc | BIC | r2 |
|---|---|---|---|---|
| ldh.cs.model.1 | 6 | 1028.425 | 1044.895 | 0.2031374 |
| ldh.cs.model.1.p2 | 8 | 1031.479 | 1053.157 | 0.2120904 |
| ldh.cs.model.1.p3 | 10 | 1034.187 | 1060.921 | 0.2239472 |
From our model comparison we can see that the model that runs temperature as a linear model performs the best. Therefore, moving forward we will use the linear model.
Fish were repeatedly sampled over four different temperatures, therefore repeated sampling needs to be accounted for. To do this random factors will be included within the model. There are a number of options that can be used for random factors including 1) accounting for repeated sampling of individuals, 2) accounting for repeated sampling of individuals nested within population, 3) account for repeated sampling of individuals and populations without nesting. All three models will be run and compared.
ldh.cs.model.1a <- glmmTMB(LCr ~ 1 + REGION*temperature + TISSUE_MASS_CENTERED + (1|fish_id),
family=gaussian(),
data = ldh.cs.data,
REML = TRUE)
ldh.cs.model.1b <- glmmTMB(LCr ~ 1 + REGION*temperature + TISSUE_MASS_CENTERED + (1|POPULATION/fish_id),
family=gaussian(),
data = ldh.cs.data,
REML = TRUE)
ldh.cs.model.1c <- glmmTMB(LCr ~ 1 + REGION*temperature + TISSUE_MASS_CENTERED + (1|fish_id) + (1 + REGION|POPULATION),
family=gaussian(),
data = ldh.cs.data,
REML = TRUE) # convergnece problemrandom factor model comparisons
| model | df | AICc | BIC | r2m | r2c |
|---|---|---|---|---|---|
| ldh.cs.model.1a | 7 | 971.2014 | 990.2944 | 0.1892872 | 0.1892872 |
| ldh.cs.model.1b | 8 | 973.4758 | 995.1543 | 0.1892883 | 0.1892883 |
| ldh.cs.model.1c | 10 | NA | NA | 0.1892663 | 0.1892663 |
Model ldh.cs.model.1a appears to be the best model, however, there seems to be little difference in how the models change depending on how the random factors are arranged.
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.08 0.024 0.036 0.08 0.92 0.436 0.876 0.856 0.728 0.668 0.384 0.612 0.756 0.92 0.96 0.3 0.16 0.404 1 0.84 ...
## $uniformity
##
## Asymptotic one-sample Kolmogorov-Smirnov test
##
## data: simulationOutput$scaledResiduals
## D = 0.097178, p-value = 0.1748
## alternative hypothesis: two-sided
##
##
## $dispersion
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## dispersion = 0.9217, p-value = 0.648
## alternative hypothesis: two.sided
##
##
## $outliers
##
## DHARMa outlier test based on exact binomial test with approximate
## expectations
##
## data: simulationOutput
## outliers at both margin(s) = 2, observations = 129, p-value = 0.2745
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
## 0.001883137 0.054882570
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 )
## 0.01550388
## $uniformity
##
## Asymptotic one-sample Kolmogorov-Smirnov test
##
## data: simulationOutput$scaledResiduals
## D = 0.097178, p-value = 0.1748
## alternative hypothesis: two-sided
##
##
## $dispersion
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## dispersion = 0.9217, p-value = 0.648
## alternative hypothesis: two.sided
##
##
## $outliers
##
## DHARMa outlier test based on exact binomial test with approximate
## expectations
##
## data: simulationOutput
## outliers at both margin(s) = 2, observations = 129, p-value = 0.2745
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
## 0.001883137 0.054882570
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 )
## 0.01550388
The ldh.cs.model.1a model looks good, and there seem to be no major violations of assumptions.
| Estimate | StdError | Zvalue | Pvalue | |
|---|---|---|---|---|
| (Intercept) | 15.8739308 | 4.0215424 | 3.9472245 | 0.0000791 |
| REGIONLeading | -4.7918249 | 5.8522314 | -0.8188030 | 0.4128988 |
| temperature | 0.4469781 | 0.0868763 | 5.1449970 | 0.0000003 |
| TISSUE_MASS_CENTERED | -0.7277932 | 0.4117470 | -1.7675738 | 0.0771322 |
| REGIONLeading:temperature | -0.0180225 | 0.1244268 | -0.1448444 | 0.8848338 |
| Chisq | Df | Pr(>Chisq) | |
|---|---|---|---|
| REGION | 1.8860596 | 1 | 0.1696470 |
| temperature | 49.6220472 | 1 | 0.0000000 |
| TISSUE_MASS_CENTERED | 3.1243173 | 1 | 0.0771322 |
| REGION:temperature | 0.0209799 | 1 | 0.8848338 |
| 2.5 % | 97.5 % | Estimate | |
|---|---|---|---|
| (Intercept) | 7.9918525 | 23.7560091 | 15.8739308 |
| REGIONLeading | -16.2619877 | 6.6783379 | -4.7918249 |
| temperature | 0.2767037 | 0.6172524 | 0.4469781 |
| TISSUE_MASS_CENTERED | -1.5348025 | 0.0792161 | -0.7277932 |
| REGIONLeading:temperature | -0.2618946 | 0.2258496 | -0.0180225 |
| Std.Dev.(Intercept)|fish_id | 7.8985911 | 14.0163425 | 10.5218514 |
| R2_conditional | R2_marginal | optional |
|---|---|---|
| 0.7184249 | 0.1892872 | FALSE |
ldh.cs.model.1a %>% emtrends(var = "temperature", type = "response") %>% pairs(by = "temperature") %>% summary(by = NULL, adjust = "tukey", infer=TRUE)SCROLL TO THE RIGHT –>
The numbers in the left most column in the table just mention that the slopes are assuming mean TISSUE_MASS_CENTERED values when looking at differences between latitudinal slopes.
ldh.cs.model.1a %>% emmeans(pairwise ~ temperature*REGION, type = "response") %>% pairs(by = "temperature") %>% summary(by = NULL, adjust = "tukey", infer=TRUE)ldh.cs.model.1a %>% update(.~1+ REGION * as.factor(temperature) + TISSUE_MASS_CENTERED + (1|fish_id)) %>%
emmeans(~REGION*temperature, type = "response") %>% summary(infer=TRUE)ldh.cs.model.1a %>% update(.~1+ REGION * as.factor(temperature) + TISSUE_MASS_CENTERED + (1|fish_id)) %>%
emmeans(~REGION*temperature, type = "response") %>% pairs(by ="REGION") %>% summary(infer=TRUE)ldh.cs.emm <- ldh.cs.model.1a %>% emmeans(~REGION*temperature)
eff_size(ldh.cs.emm, sigma = sigma(ldh.cs.model.1a), edf=df.residual(ldh.cs.model.1a))## contrast
## Core temperature34.7286821705426 - Leading temperature34.7286821705426
## effect.size SE df lower.CL upper.CL
## 0.706 0.516 127 -0.315 1.73
##
## sigma used for effect sizes: 7.675
## Confidence level used: 0.95
For initial details on the experiment performed please read the ReadMe file. In breif, Acanthochromis polyacanthus from two different regions on the Great Barrier Reef (GBR) were tested for metabolic performance at four different temperatures, 27\(^\circ\)C, 28.5\(^\circ\)C, 30\(^\circ\)C, and 31.5\(^\circ\)C. Fish used in this study were collected from two different regions, low- (i.e. Cairns) and high-latitude (i.e., Mackay), within each region fish were collected from a total of three different populations.
Immunocompetence was tested via phytohaemaglutinin (PHA) swelling assays at the same four experimental temperatures metabolic performance was tested at. To perform the assay fish were injected with 0.03 mL of PHA subcutaneously in the caudal peduncle. Thickness of injection site was measured pre-injection as well as 18-24hour post-injection. PHA produces a localized, cell-mediated response (e.g., inflammation, T-cell proliferation, etc). The change in thickness between measurement periods was used as an proxy for immunocompetence.
Before beginning always make sure that you are working in the correct directory
Now we can import that data. Replace import data with the PATH to your data file. I have secretly labelled my PATH import.data (i.e. import.data = “PATH TO MY FILE”)
Before the data can be analysed it is important to clean up the data file. Below a number of adjustments are made, primarily making sure that columns are being treated appropriately as either factors, numeric, or as time, as well as the renaming of some columns.
pha2 <- pha %>%
dplyr::rename(PHA_28.5 = PHA_285) %>%
mutate(FISH_ID = factor(FISH_ID),
POPULATION = factor(POPULATION),
REGION = factor(REGION),
TANK = factor(TANK),
PHA_28.5 = as.numeric(PHA_28.5),
MASS_CENTERED = scale(MASS, center = TRUE, scale = FALSE)) %>%
pivot_longer(cols = c(PHA_27,
PHA_28.5,
PHA_30,
PHA_31.5),
names_to = 'PHA',
values_to = 'IMMUNE_RESPONSE') %>%
separate(col = PHA,
into = c('TEST','TEMPERATURE'), sep = '_') %>%
filter(IMMUNE_RESPONSE >= 0.01) %>% # removing negative values greater than -0.05
mutate(TEMPERATURE = as.numeric(TEMPERATURE))Great! That is everything for data manipulation
ggplot(pha2, aes(x=TEMPERATURE, y=IMMUNE_RESPONSE)) +
geom_violin(alpha = 0.5) + # four potential outliers but will keep for now
geom_point() ggplot(pha2, aes(x=TEMPERATURE, y=IMMUNE_RESPONSE, fill = REGION, color = REGION)) +
geom_violin(alpha = 0.5) +
geom_point(position = position_jitterdodge(dodge.width = 0.9, jitter.width = 0), color = "black")ggplot(pha2, aes(x=MASS_CENTERED, y=IMMUNE_RESPONSE, fill = REGION, color = REGION)) +
geom_point() + geom_smooth(method = "lm")The model was fit using the glm and later glmmTMB package in R. A number of different models were tested to determine which hypothesis and associated variables best predicted resting oxygen consumption. Model fit was examined using AICc, BIC, and r-squared values. Additional model were examined via the validation diagnostics provided by the performance and dHARMA packages in R.
#--- base model ---#
pha.1 <- glm(IMMUNE_RESPONSE ~ 1 + REGION * TEMPERATURE + MASS_CENTERED,
family=gaussian(),
data = pha2) | Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 1.7185981 | 0.4222908 | 4.0697030 | 0.0000750 |
| REGIONLeading | -0.6055925 | 0.6198923 | -0.9769318 | 0.3301350 |
| TEMPERATURE | -0.0504956 | 0.0146466 | -3.4476083 | 0.0007294 |
| MASS_CENTERED | 3.0066564 | 2.2817687 | 1.3176868 | 0.1895654 |
| REGIONLeading:TEMPERATURE | 0.0207950 | 0.0213983 | 0.9718052 | 0.3326714 |
#--- experimental rmr equipment hypothesis ---#
pha.2 <- glm(IMMUNE_RESPONSE ~ 1 + REGION * TEMPERATURE,
family=gaussian(),
data = pha2) | Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 1.6908547 | 0.4227662 | 3.9995034 | 0.0000980 |
| REGIONLeading | -0.6042622 | 0.6213620 | -0.9724801 | 0.3323269 |
| TEMPERATURE | -0.0489398 | 0.0146335 | -3.3443600 | 0.0010344 |
| REGIONLeading:TEMPERATURE | 0.0195688 | 0.0214288 | 0.9132038 | 0.3625536 |
| model | df | AICc | BIC | r2 |
|---|---|---|---|---|
| pha.1 | 6 | -22.05659 | -4.195801 | 0.1032647 |
| pha.2 | 5 | -22.43443 | -7.482064 | 0.0939358 |
There is little difference between the two initial models, therefore, we will move forward with the model that has less terms.
It looks like the third model is better than the previous two. Next we will test to see if the variable temperature performs best as a 1st (linear), 2nd (quadratic), or 3rd (cubic) order polynomial. As the relationship between temperature and resting oxygen consumption is predicted to be non-linear.
Note that the linear model has already been created via model pha.2 in the previous section.
pha.2.p2 <- glm(IMMUNE_RESPONSE ~ 1 + REGION * poly(TEMPERATURE, 2),
family=gaussian(),
data = pha2)
pha.2.p3 <- glm(IMMUNE_RESPONSE ~ 1 + REGION * poly(TEMPERATURE, 3),
family=gaussian(),
data = pha2)| model | df | AICc | BIC | r2 |
|---|---|---|---|---|
| pha.2 | 5 | -22.43443 | -7.482064 | 0.0955802 |
| pha.2.p2 | 7 | -33.95206 | -13.211452 | 0.1814782 |
| pha.2.p3 | 9 | -36.46535 | -10.053268 | 0.2166317 |
From our model comparison we can see that the model improves when TEMPERATURE is modeled as 2\(^nd\) or 3\(^rd\) order polynomial. The model that implements a 3\(^rd\) order polynomial performs the best, and therefore, we will be moving forward with this model.
Fish were repeatedly sampled over four different temperatures, therefore repeated sampling needs to be accounted for. To do this random factors will be included within the model. There are a number of options that can be used for random factors including 1) accounting for repeated sampling of individuals, 2) accounting for repeated sampling of individuals nested within population, 3) account for repeated sampling of individuals and populations without nesting. All three models will be run and compared.
pha.2.p3a <- glmmTMB(IMMUNE_RESPONSE ~ 1 + REGION * poly(TEMPERATURE, 3) + (1|FISH_ID),
family=gaussian(),
data = pha2,
REML = TRUE)
pha.2.p3b <- glmmTMB(IMMUNE_RESPONSE ~ 1 + REGION * poly(TEMPERATURE, 3) + (1|POPULATION/FISH_ID),
family=gaussian(),
data = pha2,
REML = TRUE)
pha.2.p3c <- glmmTMB(IMMUNE_RESPONSE ~ 1 + REGION * poly(TEMPERATURE, 3) + (1|FISH_ID) + (1|POPULATION),
family=gaussian(),
data = pha2,
REML = TRUE)| model | df | AICc | BIC | r2m | r2c |
|---|---|---|---|---|---|
| pha.2.p3a | 10 | -19.04376 | 10.15879 | 0.2090404 | 0.2090404 |
| pha.2.p3b | 11 | -18.95044 | 13.01158 | 0.2022862 | 0.2517016 |
| pha.2.p3c | 11 | -18.95044 | 13.01158 | 0.2022862 | 0.2517016 |
There is little difference between the models, however, the nest model does seem to a bit better than the none nested model that only includes (1|FISH_ID) for this variable. There no difference between the second and third model, either could be used. Moving forward the second model with the nested random effects will be used.
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.336 0.116 0.828 0.4 0.232 0.192 0.504 0.036 0.284 0.42 0.42 0.52 0.476 0.648 0.536 0.192 0.172 0.52 0.44 0.74 ...
## $uniformity
##
## Asymptotic one-sample Kolmogorov-Smirnov test
##
## data: simulationOutput$scaledResiduals
## D = 0.10325, p-value = 0.06743
## alternative hypothesis: two-sided
##
##
## $dispersion
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## dispersion = 0.94119, p-value = 0.656
## alternative hypothesis: two.sided
##
##
## $outliers
##
## DHARMa outlier test based on exact binomial test with approximate
## expectations
##
## data: simulationOutput
## outliers at both margin(s) = 2, observations = 159, p-value = 0.3618
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
## 0.001526975 0.044697834
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 )
## 0.01257862
## $uniformity
##
## Asymptotic one-sample Kolmogorov-Smirnov test
##
## data: simulationOutput$scaledResiduals
## D = 0.10325, p-value = 0.06743
## alternative hypothesis: two-sided
##
##
## $dispersion
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## dispersion = 0.94119, p-value = 0.656
## alternative hypothesis: two.sided
##
##
## $outliers
##
## DHARMa outlier test based on exact binomial test with approximate
## expectations
##
## data: simulationOutput
## outliers at both margin(s) = 2, observations = 159, p-value = 0.3618
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
## 0.001526975 0.044697834
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 )
## 0.01257862
The pha.2.p3b model performs well, however, in the model validation performed by the performance package our modeled predictive lines aren’t matching up with our observed data as well as we might hope. There are also some issues with the residuals within our DHARMa validations. Let’s see if we can fix this by including some different link functions within out model.
pha.2.p3b <- glmmTMB(IMMUNE_RESPONSE ~ 1 + REGION * poly(TEMPERATURE, 3) + (1|POPULATION/FISH_ID),
family=gaussian(),
data = pha2,
REML = FALSE)
pha.2.p3b.log <- glmmTMB(IMMUNE_RESPONSE ~ 1 + REGION * poly(TEMPERATURE, 3) + (1|POPULATION/FISH_ID),
family=gaussian(link="log"),
data = pha2,
REML = FALSE)
pha.2.p3b.inv <- glmmTMB(IMMUNE_RESPONSE ~ 1 + REGION * poly(TEMPERATURE, 3) + (1|POPULATION/FISH_ID),
family=gaussian(link="inverse"),
data = pha2,
REML = FALSE)## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.34 0.104 0.84 0.396 0.228 0.192 0.496 0.032 0.284 0.428 0.408 0.536 0.464 0.652 0.544 0.188 0.172 0.536 0.44 0.748 ...
## $uniformity
##
## Asymptotic one-sample Kolmogorov-Smirnov test
##
## data: simulationOutput$scaledResiduals
## D = 0.096377, p-value = 0.1043
## alternative hypothesis: two-sided
##
##
## $dispersion
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## dispersion = 1.0064, p-value = 0.912
## alternative hypothesis: two.sided
##
##
## $outliers
##
## DHARMa outlier test based on exact binomial test with approximate
## expectations
##
## data: simulationOutput
## outliers at both margin(s) = 2, observations = 159, p-value = 0.3618
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
## 0.001526975 0.044697834
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 )
## 0.01257862
## $uniformity
##
## Asymptotic one-sample Kolmogorov-Smirnov test
##
## data: simulationOutput$scaledResiduals
## D = 0.096377, p-value = 0.1043
## alternative hypothesis: two-sided
##
##
## $dispersion
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## dispersion = 1.0064, p-value = 0.912
## alternative hypothesis: two.sided
##
##
## $outliers
##
## DHARMa outlier test based on exact binomial test with approximate
## expectations
##
## data: simulationOutput
## outliers at both margin(s) = 2, observations = 159, p-value = 0.3618
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
## 0.001526975 0.044697834
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 )
## 0.01257862
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.34 0.096 0.856 0.384 0.28 0.172 0.564 0.024 0.324 0.448 0.376 0.568 0.524 0.692 0.632 0.188 0.18 0.556 0.532 0.82 ...
## $uniformity
##
## Asymptotic one-sample Kolmogorov-Smirnov test
##
## data: simulationOutput$scaledResiduals
## D = 0.070591, p-value = 0.4065
## alternative hypothesis: two-sided
##
##
## $dispersion
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## dispersion = 1.0116, p-value = 0.88
## alternative hypothesis: two.sided
##
##
## $outliers
##
## DHARMa outlier test based on exact binomial test with approximate
## expectations
##
## data: simulationOutput
## outliers at both margin(s) = 1, observations = 159, p-value = 1
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
## 0.0001592188 0.0345421401
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 )
## 0.006289308
## $uniformity
##
## Asymptotic one-sample Kolmogorov-Smirnov test
##
## data: simulationOutput$scaledResiduals
## D = 0.070591, p-value = 0.4065
## alternative hypothesis: two-sided
##
##
## $dispersion
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## dispersion = 1.0116, p-value = 0.88
## alternative hypothesis: two.sided
##
##
## $outliers
##
## DHARMa outlier test based on exact binomial test with approximate
## expectations
##
## data: simulationOutput
## outliers at both margin(s) = 1, observations = 159, p-value = 1
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
## 0.0001592188 0.0345421401
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 )
## 0.006289308
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.352 0.128 0.9 0.372 0.236 0.212 0.596 0.032 0.312 0.424 0.42 0.596 0.54 0.688 0.636 0.208 0.196 0.548 0.484 0.84 ...
## $uniformity
##
## Asymptotic one-sample Kolmogorov-Smirnov test
##
## data: simulationOutput$scaledResiduals
## D = 0.080302, p-value = 0.2568
## alternative hypothesis: two-sided
##
##
## $dispersion
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## dispersion = 0.099362, p-value = 0.872
## alternative hypothesis: two.sided
##
##
## $outliers
##
## DHARMa outlier test based on exact binomial test with approximate
## expectations
##
## data: simulationOutput
## outliers at both margin(s) = 0, observations = 159, p-value = 0.6421
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
## 0.00000000 0.02293344
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 )
## 0
## $uniformity
##
## Asymptotic one-sample Kolmogorov-Smirnov test
##
## data: simulationOutput$scaledResiduals
## D = 0.080302, p-value = 0.2568
## alternative hypothesis: two-sided
##
##
## $dispersion
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## dispersion = 0.099362, p-value = 0.872
## alternative hypothesis: two.sided
##
##
## $outliers
##
## DHARMa outlier test based on exact binomial test with approximate
## expectations
##
## data: simulationOutput
## outliers at both margin(s) = 0, observations = 159, p-value = 0.6421
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
## 0.00000000 0.02293344
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 )
## 0
Adding log or inverse link functions to the model does not help. In fact, it seems to make the model worse! From here we can try to experiment with different distributions. The first distribution that comes to mind is the Gamma distribution, as it can be helpful when dealing with skewed data when the data set contains no zeros and all positive values.
pha.2.p3b <- glmmTMB(IMMUNE_RESPONSE ~ 1 + REGION * poly(TEMPERATURE, 3) + (1|POPULATION/FISH_ID),
family=gaussian(),
data = pha2,
REML = FALSE)
pha.2.p3b.gamma <- glmmTMB(IMMUNE_RESPONSE~ 1 + REGION* poly(TEMPERATURE, 3) + (1|POPULATION/FISH_ID),
family=Gamma(link="log"), # default option
data = pha2,
REML = FALSE)| model | df | AICc | BIC | r2m | r2c |
|---|---|---|---|---|---|
| pha.2.p3b | 11 | -32.7787 | -0.8166702 | 0.2153506 | 0.2153506 |
| pha.2.p3b.gamma | 11 | -136.0719 | -104.1099081 | 0.2635317 | 0.2635317 |
From this model comparison we can see that the model fitted with the Gamma distribution performs much better than the model fitted with the gaussian distribution. Let’s look at the model validation plots for out Gamma model.
Looks better
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.288 0.176 0.888 0.212 0.16 0.3 0.664 0.008 0.388 0.504 0.6 0.676 0.6 0.888 0.656 0.368 0.012 0.736 0.552 0.86 ...
## $uniformity
##
## Asymptotic one-sample Kolmogorov-Smirnov test
##
## data: simulationOutput$scaledResiduals
## D = 0.059044, p-value = 0.6364
## alternative hypothesis: two-sided
##
##
## $dispersion
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## dispersion = 0.7999, p-value = 0.4
## alternative hypothesis: two.sided
##
##
## $outliers
##
## DHARMa outlier test based on exact binomial test with approximate
## expectations
##
## data: simulationOutput
## outliers at both margin(s) = 0, observations = 159, p-value = 0.6421
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
## 0.00000000 0.02293344
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 )
## 0
## $uniformity
##
## Asymptotic one-sample Kolmogorov-Smirnov test
##
## data: simulationOutput$scaledResiduals
## D = 0.059044, p-value = 0.6364
## alternative hypothesis: two-sided
##
##
## $dispersion
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## dispersion = 0.7999, p-value = 0.4
## alternative hypothesis: two.sided
##
##
## $outliers
##
## DHARMa outlier test based on exact binomial test with approximate
## expectations
##
## data: simulationOutput
## outliers at both margin(s) = 0, observations = 159, p-value = 0.6421
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
## 0.00000000 0.02293344
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 )
## 0
Looks much better!
The Gamma does a decent job of modelling our data and we can move forward with it and start to investigate the model. #### {-}
| Estimate | StdError | Zvalue | Pvalue | |
|---|---|---|---|---|
| (Intercept) | -1.3847661 | 0.0907111 | -15.2656703 | 0.0000000 |
| REGIONLeading | -0.1636031 | 0.1348218 | -1.2134770 | 0.2249475 |
| poly(TEMPERATURE, 3)1 | -4.8549609 | 1.1206614 | -4.3322281 | 0.0000148 |
| poly(TEMPERATURE, 3)2 | -2.6825503 | 1.1140906 | -2.4078385 | 0.0160473 |
| poly(TEMPERATURE, 3)3 | 1.1279198 | 1.1107719 | 1.0154378 | 0.3098972 |
| REGIONLeading:poly(TEMPERATURE, 3)1 | 1.3686326 | 1.6384364 | 0.8353285 | 0.4035328 |
| REGIONLeading:poly(TEMPERATURE, 3)2 | -2.4701223 | 1.6562860 | -1.4913622 | 0.1358664 |
| REGIONLeading:poly(TEMPERATURE, 3)3 | 0.3600876 | 1.6536806 | 0.2177492 | 0.8276245 |
| Chisq | Df | Pr(>Chisq) | |
|---|---|---|---|
| REGION | 1.421053 | 1 | 0.2332302 |
| poly(TEMPERATURE, 3) | 50.414204 | 3 | 0.0000000 |
| REGION:poly(TEMPERATURE, 3) | 2.933305 | 3 | 0.4020230 |
| 2.5 % | 97.5 % | Estimate | |
|---|---|---|---|
| (Intercept) | -1.5625566 | -1.2069755 | -1.3847661 |
| REGIONLeading | -0.4278489 | 0.1006427 | -0.1636031 |
| poly(TEMPERATURE, 3)1 | -7.0514170 | -2.6585049 | -4.8549609 |
| poly(TEMPERATURE, 3)2 | -4.8661279 | -0.4989728 | -2.6825503 |
| poly(TEMPERATURE, 3)3 | -1.0491531 | 3.3049927 | 1.1279198 |
| REGIONLeading:poly(TEMPERATURE, 3)1 | -1.8426438 | 4.5799090 | 1.3686326 |
| REGIONLeading:poly(TEMPERATURE, 3)2 | -5.7163832 | 0.7761385 | -2.4701223 |
| REGIONLeading:poly(TEMPERATURE, 3)3 | -2.8810668 | 3.6012420 | 0.3600876 |
| Std.Dev.(Intercept)|FISH_ID:POPULATION | 0.0000000 | Inf | 0.0000462 |
| Std.Dev.(Intercept)|POPULATION | 0.0000024 | 691.9122734 | 0.0410140 |
| R2_conditional | R2_marginal | optional |
|---|---|---|
| 0.265392 | 0.2635317 | FALSE |
Note that the random effects within this model are explaining very little variance, and are largely non-informative.
pha.2.p3b.gamma %>% emtrends(var = "TEMPERATURE", type = "response") %>% pairs(by = "TEMPERATURE") %>% summary(by = NULL, adjust = "tukey", infer=TRUE)SCROLL TO THE RIGHT –>
The numbers in the left most column in the table just mention that the slopes are assuming mean MASS_CENTERED and RESTING_TIME_SEONDS values when looking at differences between latitudinal slopes.
pha.2.p3b.gamma %>% emmeans(pairwise ~ TEMPERATURE*REGION, type = "response") %>% pairs(by = "TEMPERATURE") %>% summary(by = NULL, adjust = "tukey", infer=TRUE)pha.2.p3b.gamma %>% update(.~ 1 + REGION* as.factor(TEMPERATURE) + (1|POPULATION/FISH_ID)) %>%
emmeans(~REGION*TEMPERATURE, type = "response") %>% summary(infer=TRUE)pha.2.p3b.gamma %>% update(.~ 1 + REGION* as.factor(TEMPERATURE) + (1|POPULATION/FISH_ID)) %>%
emmeans(~REGION*TEMPERATURE, type = "response") %>% pairs(by ="REGION") %>% summary(infer=TRUE)immuno.emm <- pha.2.p3b.gamma %>% emmeans(~REGION*TEMPERATURE)
eff_size(immuno.emm, sigma = sigma(pha.2.p3b.gamma), edf=df.residual(pha.2.p3b.gamma))## contrast
## Core TEMPERATURE28.9339622641509 - Leading TEMPERATURE28.9339622641509
## effect.size SE df asymp.LCL asymp.UCL
## -0.105 0.265 Inf -0.626 0.415
##
## sigma used for effect sizes: 0.815
## Confidence level used: 0.95
For initial details on the experiment performed please read the ReadMe file. In breif, Acanthochromis polyacanthus from two different regions on the Great Barrier Reef (GBR) were tested for metabolic performance at four different temperatures, 27\(^\circ\)C, 28.5\(^\circ\)C, 30\(^\circ\)C, and 31.5\(^\circ\)C. Fish used in this study were collected from two different regions, low- (i.e. Cairns) and high-latitude (i.e., Mackay), within each region fish were collected from a total of three different populations.
Blood samples for hematocrit sampling were collected 2-weeks after fish underwent respiormetry testing at the final experimental temperature (31.5\(^\circ\)C). Hematocrit ratios were measured by comparing the amount of packed red blood cells to blood plasma, after blood samples collected via capillary tubes.
Before beginning always make sure that you are working in the correct directory
Now we can import that data. Replace import data with the PATH to your data file. I have secretly labelled my PATH import.data (i.e. import.data = “PATH TO MY FILE”)
Before the data can be analysed it is important to clean up the data file. Below a number of adjustments are made, primarily making sure that columns are being treated appropriately as either factors, numeric, or as time, as well as the renaming of some columns.
hema <- hema %>%
mutate(PERC_RBC = as.numeric(PERC_RBC),
MASS = as.numeric(MASS),
MASS_CENTERED = scale(MASS, center = TRUE, scale = FALSE)) %>%
drop_na(PERC_RBC)Great! That is everything for data manipulation
The model was fit using the glm and later glmmTMB package in R. A number of different models were tested to determine which hypothesis and associated variables best predicted resting oxygen consumption. Model fit was examined using AICc, BIC, and r-squared values. Additional model were examined via the validation diagnostics provided by the performance and dHARMA packages in R.
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 0.2236353 | 0.0121433 | 18.41628 | 0.0000000 |
| REGIONLeading | 0.0355744 | 0.0181554 | 1.95944 | 0.0578398 |
#--- experimental rmr equipment hypothesis ---#
hema.2 <- glm(PERC_RBC ~ REGION + MASS_CENTERED,
family = gaussian(),
data = hema) | Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 0.2251300 | 0.0137982 | 16.3159093 | 0.0000000 |
| REGIONLeading | 0.0322334 | 0.0230904 | 1.3959651 | 0.1715176 |
| MASS_CENTERED | -0.0003688 | 0.0015402 | -0.2394485 | 0.8121545 |
| model | df | AICc | BIC | r2 |
|---|---|---|---|---|
| hema.1 | 3 | -107.0515 | -102.84462 | 0.0940123 |
| hema.2 | 4 | -104.6075 | -99.26924 | 0.0930529 |
There is little difference between the two initial models, however, the model that does not include MASS_CENTERED prefers better via the model comparisons scores, therefore, we will move forward with the first and most simple model.
Fish were only sampled once, therefore, there is no need to include individual as a random factor. However, we will test how the inclusion of POPULATION influences the model.
hema.1 <- glmmTMB(PERC_RBC ~ REGION,
family = gaussian(),
REML = TRUE,
data = hema)
hema.1b <- glmmTMB(PERC_RBC ~ REGION + (1|POPULATION),
family = gaussian(),
REML = TRUE,
data = hema)
hema.1c <- glmmTMB(PERC_RBC ~ REGION + (REGION|POPULATION),
family = gaussian(),
REML = TRUE,
data = hema) # convergence problem## Warning in finalizeTMB(TMBStruc, obj, fit, h, data.tmb.old): Model convergence
## problem; singular convergence (7). See vignette('troubleshooting'),
## help('diagnose')
| model | df | AICc | BIC | r2m | r2c |
|---|---|---|---|---|---|
| hema.1 | 3 | -93.24011 | -89.03324 | 0.0940123 | 0.0940123 |
| hema.1b | 4 | -90.73388 | -85.39565 | 0.0940123 | 0.0940123 |
| hema.1c | 6 | -85.58393 | -78.46809 | 0.0958933 | 0.1526837 |
The inclusion of random effects does help explain additional variation and therefore will not be included in the model. Note the final model will be run using glm and not glmmmTMB because we are not using a mixed model.
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.892 0.956 0.616 0.744 0.06 0.548 0.1 0.06 0.108 0.148 0.188 0.06 0.628 0.392 0.396 0.06 0.912 0.968 0.84 0.276 ...
## $uniformity
##
## Asymptotic one-sample Kolmogorov-Smirnov test
##
## data: simulationOutput$scaledResiduals
## D = 0.084842, p-value = 0.9473
## alternative hypothesis: two-sided
##
##
## $dispersion
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## dispersion = 0.97134, p-value = 0.952
## alternative hypothesis: two.sided
##
##
## $outliers
##
## DHARMa outlier test based on exact binomial test with approximate
## expectations
##
## data: simulationOutput
## outliers at both margin(s) = 0, observations = 38, p-value = 1
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
## 0.00000000 0.09251276
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 )
## 0
## $uniformity
##
## Asymptotic one-sample Kolmogorov-Smirnov test
##
## data: simulationOutput$scaledResiduals
## D = 0.084842, p-value = 0.9473
## alternative hypothesis: two-sided
##
##
## $dispersion
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## dispersion = 0.97134, p-value = 0.952
## alternative hypothesis: two.sided
##
##
## $outliers
##
## DHARMa outlier test based on exact binomial test with approximate
## expectations
##
## data: simulationOutput
## outliers at both margin(s) = 0, observations = 38, p-value = 1
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
## 0.00000000 0.09251276
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 )
## 0
The basic looks good and passes the validation checks.
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 0.2236353 | 0.0121433 | 18.41628 | 0.0000000 |
| REGIONLeading | 0.0355744 | 0.0181554 | 1.95944 | 0.0578398 |
| LR Chisq | Df | Pr(>Chisq) | |
|---|---|---|---|
| REGION | 3.839405 | 1 | 0.0500613 |
## Waiting for profiling to be done...
| 2.5 % | 97.5 % | |
|---|---|---|
| (Intercept) | 0.1998348 | 0.2474358 |
| REGIONLeading | -0.0000095 | 0.0711583 |
| R2 | |
|---|---|
| R2 | 0.0963721 |
## $emmeans
## REGION emmean SE df lower.CL upper.CL
## Core 0.224 0.0121 36 0.199 0.248
## Leading 0.259 0.0135 36 0.232 0.287
##
## Confidence level used: 0.95
##
## $contrasts
## contrast estimate SE df t.ratio p.value
## Core - Leading -0.0356 0.0182 36 -1.959 0.0578
MAXIMUM METABOLIC RATE and ABSOLUTE AEROBIC SCOPE showed significant differences within the temperature response curve between fish collected from the low latitude region of the Great Barrier Reef and the sampled high-latitude (Mackay region) of the Great Barrier Reef. However no significant difference was seen in RESTING METABOLIC RATE. All oxygen consumption metrics showed significant positive relationships with temperature.
Similarly, the two enzymes that were analysed in this study showed significant positive relationships with temperature, however no significant differences between low- and high-latitude regions. Immunocompetence displayed a hill shape relationship with temperature, represented via a third order polynomial model. Immunocompetence was also significantly related to temperature but, once again no significant differences were observed between low- and high-latitude regions.
No significant differences were observed between low- and high-latitude regions in respect to hematocrit ratios, however, the difference between regions was marginally significant, pvalue =0.057, and deserves frther investigation in future research.
Further analysis of results can be see within the paper titled “[INSERT TITLE HERE]”, doi: XXXX
rmr.g2 <- ggplot(rmr.emm.df, aes(y=emmean, x=TEMPERATURE, color=REGION, linetype=REGION))+
geom_jitter(data=rmr.obs, aes(y=Fit, color=REGION), width=0.05, alpha = 0.3) +
stat_smooth(method = "lm",
formula =y ~ poly(x, 2, raw=TRUE)) +
geom_ribbon(aes(x=TEMPERATURE, ymin= lower.CL, ymax= upper.CL, fill = REGION),
alpha = 0.2, color=NA)+
scale_x_continuous(limits = c(26.9, 31.6), breaks = seq(27, 31.5, by = 1.5))+
scale_y_continuous(limits = c(2,12), breaks = seq(2, 12, by = 2)) +
theme_classic() + ylab(expression("RESTING METABOLIC RATE (MgO "[2]* " hr"^{-1} * ")")) + xlab("")+
scale_linetype_manual(values = c("solid", "dashed"), labels = c("Low-latitude","High-latitude")) +
scale_color_manual(values=c("#B2182B", "#4393C3"), labels = c("Low-latitude","High-latitude")) +
scale_fill_manual(values=c("#B2182B", "#4393C3"), labels = c("Low-latitude","High-latitude")) +
theme(legend.position = "top",
legend.text = element_text(size = 10),
legend.title = element_blank(),
axis.title = element_text(size =12),
axis.text = element_text(size=10))
#annotate("text", x=30, y= 11.5, label="P =0.51", fontface="italic", size=5)
mmr.g2 <- ggplot(mmr.emm.df, aes(y=emmean, x=TEMPERATURE, color=REGION, linetype=REGION))+
geom_jitter(data=mmr.obs, aes(y=Fit, color=REGION), width=0.05, alpha = 0.3) +
stat_smooth(method = "lm",
formula =y ~ poly(x, 2, raw=TRUE)) +
geom_ribbon(aes(x=TEMPERATURE, ymin= lower.CL, ymax= upper.CL, fill = REGION),
alpha = 0.2, color=NA) +
scale_x_continuous(limits = c(26.9, 31.6), breaks = seq(27, 31.5, by = 1.5))+
scale_y_continuous(limits = c(6,28), breaks = seq(6, 28, by = 2)) +
theme_classic() + ylab(expression("MAXIMUM OXYGEN CONSUMPTION (MgO " [2]* " hr"^{-1} * ")")) + xlab("")+
scale_linetype_manual(values = c("solid", "dashed"), labels = c("Low-latitude","High-latitude")) +
scale_color_manual(values=c("#B2182B", "#4393C3"), labels = c("Low-latitude","High-latitude")) +
scale_fill_manual(values=c("#B2182B", "#4393C3"), labels = c("Low-latitude","High-latitude")) +
theme(legend.position = 'none',
legend.text = element_text(size = 10),
legend.title = element_blank(),
axis.title = element_text(size =12),
axis.text = element_text(size=10))
#annotate("text", x=30, y= 27, label="P =0.0010", fontface="italic", size=5)
nas.g2 <- ggplot(nas.emm.df, aes(y=emmean, x=TEMPERATURE, color=REGION, linetype=REGION)) +
geom_jitter(data=nas.obs, aes(y=Fit, color=REGION), width=0.05, alpha = 0.3) +
stat_smooth(method = "lm",
formula =y ~ poly(x, 2, raw=TRUE)) +
geom_ribbon(aes(x=TEMPERATURE, ymin= lower.CL, ymax= upper.CL, fill = REGION),
alpha = 0.2, color=NA)+
scale_y_continuous(limits = c(4,20), breaks = seq(4, 20, by = 2)) +
scale_x_continuous(limits = c(26.9, 31.6), breaks = seq(27, 31.5, by = 1.5))+
theme_classic() + ylab(expression("ABSOLUTE AEROBIC SCOPE (MgO "[2]* " hr"^{-1} * ")")) +
scale_linetype_manual(values = c("solid", "dashed"), labels = c("Low-latitude","High-latitude")) +
scale_color_manual(values=c("#B2182B", "#4393C3"), labels = c("Low-latitude","High-latitude")) +
scale_fill_manual(values=c("#B2182B", "#4393C3"), labels = c("Low-latitude","High-latitude")) +
theme(legend.position = "none",
legend.text = element_text(size = 10),
legend.title = element_blank(),
axis.title = element_text(size =12),
axis.text = element_text(size=10))
#annotate("text", x=30, y= 19, label="P =0.0010", fontface="italic", size=5)fig2 <- ggarrange(rmr.g2, mmr.g2, nas.g2,
nrow = 3,
ncol=1,
align = "v",
labels = c("A","B","C"),
common.legend = TRUE); fig2## Warning: Removed 4 rows containing missing values (`geom_point()`).
pha.emm <- emmeans(pha.2.p3b.gamma, ~ TEMPERATURE*REGION,
at = list(TEMPERATURE = seq(from=27, to = 31.5, by=.1)),
type='response')
pha.emm.df=as.data.frame(pha.emm)
pha.obs <- pha2 %>%
mutate(Pred=predict(pha.2.p3b.gamma, re.form=NA),
Resid = residuals(pha.2.p3b.gamma, type='response'),
Fit = Pred + Resid)
pha.g2 <- ggplot(pha.emm.df, aes(y=response, x=TEMPERATURE, color = REGION, linetype=REGION)) +
stat_smooth(method = "lm", se=FALSE,
formula =y ~ poly(x, 3, raw=TRUE)) +
geom_ribbon(aes(x=TEMPERATURE, ymin= asymp.LCL, ymax= asymp.UCL, fill = REGION),
alpha = 0.2, color=NA) +
#scale_y_continuous(limits = c(0,0.9), breaks = seq(0, 0.9, by =0.15)) +
theme_classic() + ylab("PHA RESPONSE (mm)") +
scale_linetype_manual(values = c("solid", "dashed"), labels = c("Low-latitude","High-latitude")) +
scale_color_manual(values=c("#B2182B", "#4393C3"), labels = c("Low-latitude","High-latitude")) +
scale_fill_manual(values=c("#B2182B", "#4393C3"), labels = c("Low-latitude","High-latitude")) +
theme(legend.position = c(0.855,0.8),
legend.text = element_text(size = 10),
legend.title = element_blank(),
axis.title = element_text(size =12),
axis.text = element_text(size=10))
#annotate("text", x=30, y=0.495, fontface="italic", size=5, label="P =0.85")#---ldh---#
ldh.emm <- emmeans(ldh.model.1.p2a, ~ temperature*REGION,
at = list(temperature = seq(from=20, to = 50, by=1)))
ldh.emm.df=as.data.frame(ldh.emm)
ldh.obs <- ldh.data %>%
mutate(Pred = predict(ldh.model.1.p2a, re.form=NA),
Resid = residuals(ldh.model.1.p2a, type = 'response'),
Fit = Pred - Resid)
cldh2 <- ggplot(ldh.emm.df, aes(y=emmean, x=temperature, color=REGION, fill=REGION, linetype=REGION)) +
stat_smooth(method = "lm", se=FALSE,
formula =y ~ poly(x, 3, raw=TRUE)) +
geom_ribbon(aes(x=temperature, ymin= lower.CL, ymax= upper.CL, fill = REGION),
alpha = 0.2, color=NA) +
geom_jitter(data=ldh.obs, aes(y=Fit, color=REGION), width=0.05, alpha = 0.3) +
scale_y_continuous(limits = c(0,250), breaks = seq(0, 250, by =50)) +
theme_classic() + ylab(expression("LDH ACTIVITY (U mg "^{-1}*" tissue)")) + xlab("TEMPERATURE") +
scale_linetype_manual(values = c("solid", "dashed"), labels = c("Low-latitude","High-latitude")) +
scale_color_manual(values=c("#B2182B", "#4393C3"), labels = c("Low-latitude","High-latitude")) +
scale_fill_manual(values=c("#B2182B", "#4393C3"), labels = c("Low-latitude","High-latitude")) +
theme(legend.position = c(0.80,0.2),
legend.text = element_text(size = 10),
legend.title = element_blank(),
axis.title = element_text(size =12),
axis.text = element_text(size=10))
#annotate("text", x=40, y=240, label="p =0.98", fontface = 'italic', size = 5)
#--- cs ---#
cs.emm <- emmeans(cs.model.1a.log.p2, ~ TEMPERATURE*REGION, type='response',
at = list(TEMPERATURE = seq(from=20, to = 50, by=1)),
)
cs.emm.df=as.data.frame(cs.emm)
cs.obs <- CS.data %>%
mutate(Pred = predict(cs.model.1a.log.p2, re.form=NA, type= 'response'),
Resid = residuals(cs.model.1a.log.p2, type = 'response'),
Fit = Pred - Resid)
cs.plot2 <- ggplot(cs.emm.df, aes(y=response, x=TEMPERATURE, color=REGION, fill=REGION, linetype=REGION)) +
stat_smooth(method = "lm", se=FALSE,
formula =y ~ poly(x, 2, raw=TRUE)) +
geom_jitter(data=cs.obs, aes(y=Fit, color=REGION), width=0.05, alpha = 0.3) +
geom_ribbon(aes(x=TEMPERATURE, ymin= lower.CL, ymax= upper.CL, fill = REGION),
alpha = 0.2, color=NA) +
scale_y_continuous(limits = c(0,10), breaks = seq(0, 10, by =2)) +
theme_classic() + ylab(expression("CS ACTIVITY (U mg "^{-1}*" tissue)")) + xlab("TEMPERATURE") +
scale_linetype_manual(values = c("solid", "dashed"), labels = c("Low-latitude","High-latitude")) +
scale_color_manual(values=c("#B2182B", "#4393C3"), labels = c("Low-latitude","High-latitude")) +
scale_fill_manual(values=c("#B2182B", "#4393C3"), labels = c("Low-latitude","High-latitude"))+
theme(legend.position = 'none',
legend.text = element_text(size = 10),
legend.title = element_blank(),
axis.title = element_text(size =12),
axis.text = element_text(size=10))
#annotate("text", x=40, y=9.8, label="p =0.15", fontface = 'italic', size = 5)
#--- ldh:cs ratio ---#
ldh.cs.emm <- emmeans(ldh.cs.model.1a, ~ temperature*REGION, type='response',
at = list(temperature = seq(from=20, to = 50, by=1)),
)
ldh.cs.emm.df=as.data.frame(ldh.cs.emm)
ldh.cs.obs <- ldh.cs.data %>%
mutate(Pred = predict(ldh.cs.model.1a, re.form=NA, type= 'response'),
Resid = residuals(ldh.cs.model.1a, type = 'response'),
Fit = Pred - Resid)
ldh.cs.plot2 <- ggplot(ldh.cs.emm.df, aes(y=emmean, x=temperature, color=REGION, fill=REGION, linetype=REGION)) +
stat_smooth(method = "lm", se=FALSE,
formula =y ~ poly(x, 2, raw=TRUE)) +
geom_jitter(data=ldh.cs.obs, aes(y=Fit, color=REGION), width=0.05, alpha = 0.3) +
geom_ribbon(aes(x=temperature, ymin= lower.CL, ymax= upper.CL, fill = REGION),
alpha = 0.2, color=NA) +
scale_y_continuous(limits = c(0,50), breaks = seq(0, 50, by =10)) +
theme_classic() + ylab("LDH:CS RATIO") + xlab("TEMPERATURE") +
scale_linetype_manual(values = c("solid", "dashed"), labels = c("Low-latitude","High-latitude")) +
scale_color_manual(values=c("#B2182B", "#4393C3"), labels = c("Low-latitude","High-latitude")) +
scale_fill_manual(values=c("#B2182B", "#4393C3"), labels = c("Low-latitude","High-latitude"))+
theme(legend.position = 'none',
legend.text = element_text(size = 10),
legend.title = element_blank(),
axis.title = element_text(size =12),
axis.text = element_text(size=10))
#annotate("text", x=40, y=48, label="p =0.91", fontface = 'italic', size = 5)fig4 <- ggarrange(cldh2, cs.plot2, ldh.cs.plot2,
nrow = 3,
ncol=1,
align = "v",
labels = c("A","B","C"),
common.legend = TRUE)
fig4hema.newdata <- hema.1 %>% ggemmeans(~REGION) %>%
as.data.frame() %>%
dplyr::rename(REGION = x)
obs <- hema %>%
mutate(Pred = predict(hema.1, re.form=NA),
Resid = residuals(hema.1, type = "response"),
Fit = Pred + Resid)
hema.plot <- ggplot(hema.newdata, aes(y=predicted, x=REGION, color=REGION, linetype=REGION)) +
geom_jitter(data=obs, aes(y=Pred, x=REGION, color =REGION),
width = 0.05, alpha=0.3)+
geom_pointrange(aes(ymin=conf.low,
ymax=conf.high),
shape = 19,
size = 1,
position = position_dodge(0.2)) +
scale_linetype_manual(values = c("solid", "dashed"), labels = c("Low-latitude","High-latitude")) +
scale_color_manual(values=c("#B2182B", "#4393C3"), labels = c("Low-latitude","High-latitude")) +
ylab("HEMATOCRIT RATIO") +
scale_x_discrete(name = "",
labels = c("Low-latitude","High-latitude"))+
theme_classic() +
theme(legend.position = 'top',
legend.text = element_text(size = 10),
legend.title = element_blank(),
axis.title = element_text(size =12),
axis.text = element_text(size=10))
#annotate("text", x=1.5, y=0.275, fontface="italic", size=5, label="P =0.057")library(ggridges)
mass.distr <- resp4 %>% distinct(FISH_ID, .keep_all = TRUE) %>%
mutate(CHAMBER_RATIO = 1.5/DRY_WEIGHT) %>%
ggplot(aes(x=CHAMBER_RATIO, y=REGION, fill=REGION)) +
scale_fill_manual(values=c("#B2182B", "#4393C3"), labels = c("Low-latitude","High-latitude")) +
ylab("")+ scale_x_continuous(limits = c(20,150), breaks = seq(20, 150, by =20)) +
geom_density_ridges(scale = 2, jittered_points=TRUE, position = position_points_jitter(height = 0),
point_shape = '|', point_size = 3, point_alpha = 1, alpha = 0.7) +
theme_classic() +
theme(axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank())## Picking joint bandwidth of 5.99
ggsave("C:/Users/jc527762/OneDrive - James Cook University/PhD dissertation/Data/Chapter1_LocalAdaptation/supplemental_figures/Supplemental_figure3.pdf",mass.distr, device="pdf", width=6.6, height = 5, units = "in", dpi=1200)## Picking joint bandwidth of 5.99